


THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 








Volume XXX April, 1939 Number 4 








AN APPRAISAL OF THE BETTS VISUAL SENSATION 
AND PERCEPTION TESTS AS A SORTING DEVICE 
FOR USE IN SCHOOLS! 


LURA OAK 
Head of Research Learning Project, Massachusetts Department of Public Health 


In the early development of the program of the Research Learning 
Project, begun two years ago, an effort was made to learn which of the 
children included in the major study should be examined for possible 
vision correction. It was soon apparent that the Snellen chart tests 
as given by classroom teachers (in the first grade, with the illiterate E 
chart) were inadequate for this sorting; that teachers generally were 
dissatisfied with the results; and that they protested against giving 
the tests. A study of conditions under which the testing is done in 
the schools and the criticisms expressed by teachers and school 
administrators left no doubt that the present method of school vision- 
testing is highly unsatisfactory both from the standpoint of adminis- 
tration and effectiveness in detecting the particular children whose 
eyes should receive expert attention. 

It was observed that a number of schools had purchased a stereo- 
scope known as the Keystone Ophthalmic Telebinocular. They were 
experimenting with the vision tests used with the instrument which 
compose a part of the educational materials known as the Betts Ready 
to Read Tests.2, The general impression to be gained from most of 
the schools using these materials was an enthusiastic report of their 
value. The children’s response to the tests was said to be excellent 





1 This study was made by the staff of the Research Learning Project, Division 
of Child Hygiene: M. Luise Diez, M.D., Director. In addition to the writer the 
staff consists of the following: Albert E. Sloane, M.D., ophthalmologist; Miss 
Miriam Forster, M.A., research assistant; J. W. M. Rothney, Ed.D., consultant 
Statistician. The study was undertaken as a subsidiary investigation incidental 
to the main purpose of the Project which is to study the causes of early school 
failure. 

2 D.B. Series, Keystone View Company, Meadville, Pa. 

241 





242 The Journal of Educational Psychology 


and the attractive and unusual nature of the materials added to the 
favorable general impression. 

The Keystone Company claims for the tests a background of 
research and the endorsement of eye experts.! The particular claim 
which attracted our attention was that of the decisiveness with which 
children needing eye attention could be sorted out. 

In view of the favorable evidence and the importance of finding a 
practicable solution to the problem which immediately confronted our 
staff and, spparently, extends to school authorities everywhere, a 
study was outlined for the purpose of investigating the efficiency of 
the Betts Visual Sensation and Perception tests (hereafter referred 
to as the Telebinocular tests) in sorting out school children who need 
ocular examination. The materials are designed and advertised for 
use by non-medical persons such as teachers and nurses without special 
training beyond what instruction and practice as can be given in an 
hour’s time. It was thought that such a ready-made battery of tests, 
once its effectiveness for practical use was satisfactorily established, 
would serve the immediate purposes of the major study and that it 
might merit official endorsement leading to general adoption in the 
schools. 

The report which follows should not be construed as a criticism of 
the principles underlying the tests themselves or of the mechanical 
construction of the instrument. It should be kept in mind that 
several functions are claimed for the Telebinocular in connection with 
other types of materials. The present study is an investigation of one 
function only. No effort was made to substantiate or to discredit the 
theory of functional disorders of the eye with which these tests are 
concerned. The present study is directed primarily to the question: 
Does the vision testing material as it is dispensed and used in schools 
serve to sort out the children who should be referred to an eye specialist? 

Two groups of one hundred children each were included in the 
investigation. Group A was composed of children living in rural or 
semi-rural districts. Their ages ranged from nine to fifteen years. 
They were all suspected of vision difficulty by their teachers or were 
handicapped in reading. Sixty-five were boys, thirty-five were girls. 
Group B consisted of an entire fourth grade in a town school and all 
of the children in two rural schools. In age they ranged from six to 
fifteen years. The number of boys was fifty-one; girls, forty-nine. 





1. A. Betts is responsible for the special features of the tests which relate to 
reading functions. 





te 
al 
re 
in 


ac 


Di 








2 to 





Betts Tests as a Sorting Device for Use in Schools 243 


The children included in this study probably present a fair sampling 
of the public-school population of Massachusetts exclusive of the large 
metropolitan areas. In socio-economic status they represent such 
groups as families on relief, fishermen, farmers, professional men, 
college professors, and business men. The majority attend village 
and rural schools in the western part of the State, the others live in a 
town on Cape Cod. 

The Telebinocular tests were given to two hundred children. In 
Group A the two tests were administered approximately a year apart 
and all were given by the same person, a member of the staff. Two 
school nurses administered these tests to the children of Group B. 
One gave them to all of the rural school children, and the other to the 
fourth-grade class in the town. For all of these children the second 
tests were given within a month of the first. Each nurse had received 
more than an hour’s instruction in the use of the tests and was 
instructed to study the manual! and to be guided by the directions 
contained therein. Because of errors and omissions in recording, it 
was necessary to have a member of our staff retest and check about 
forty cases in Group B. In all instances there was no reference to the 
record of the previous test while the second was being given. The 
testing was done individually in private rooms. The results of these 
four hundred individual tests were tabulated to indicate the cases 
in which the attention of an eye specialist was indicated by the 
Telebinocular record. 

The two hundred children were then examined by the staff oph- 
thalmologist who indicated in the report on each whether or not, 
according to his findings, the child should be referred to a specialist. 

The two records of the Telebinocular, taken singly and together 
and interpreted by several criteria, were then compared with those 
made by the physician. 

An explanation of the scoring procedures used in the Telebinocular 
tests, a description of the ophthalmologist’s screening tests, and the 
method used by him for making recommendations to the school 
authorities concerning the need for more complete ocular examination, 
require some elaboration in order to show the extent to which the 
investigation was carried. 

The first tabulations made of the Telebinocular tests, scored 
according to the manual of instructions, indicated an extraordinarily 





1 Betts, Emmet A.: Appendix C of The Prevention and Correction of Reading 
Difficulties. Row, Peterson & Co., 1936. 


244 The Journal of Educational Psychology 


high number of cases to be sent to an eye specialist for diagnosis. 
Our findings corroborated other reports that there was a general 
confusion among those using the Telebinocular tests over the question 
of how to determine which children among the large numbers screened 
out should be sent to eye specialists. A representative of the manu- 
facturer’s staff was consulted and his services enlisted in order to check 
our records for possible errors of interpretation. Wide experience in 
giving the tests had made it possible for him to work out what he 
considered to be a more satisfactory “‘system”’ of scoring! than that 
given in the manual. He was asked to go over the record sheets and 
to make independent recommendations on the basis of his plan for 
interpreting the scores. The tables below show comparisons between 
his interpretations, those made according to the manual, and the 
recommendations made by the ophthalmologist. In all instances 
where interpretations and recommendations were made from test 
records or examinations there was no reference to any records except 
those immediately under consideration. The tabulations were done 
by a third person whose accuracy was checked by an office assistant. 

Preparations for all the vision testing by the ophthalmologist 
were made beforehand on each occasion. Two consecutive days 
were usually given to these vision surveys in each community. An 
examining room was set up in any convenient space—a vacant school 





1 His own description of his ‘“‘system”’ is briefly as follows: Tests 3 (Visual 
Efficiency) and 8 (Sharpness of Image) are the only single tests for failure on which 
the child should be referred to an eye-specialist. In Test 3, a record lower than 
ninety per cent on any of the three subtests is considered failure and a record of 
one hundred five per cent or higher on all three sub-tests is considered a basis for 
referral. Failure to see three lines on any of the yellow test balls in Test 8 (Sharp- 
ness of Image) is always interpreted as a basis for referral. Marked divergence 
from the normal range in Test 6 (Lateral Imbalance) is usually considered a basis 
for referral if the fusion tests are also failed. Some leeway should be allowed for 
the interpretation of exceptional cases where referral might be made upon the 
basis of the entire series of tests without following the above criteria exactly. 

A further statement from this experienced person is of interest in this connec- 
tion since it points to a common problem met by those who seek to get adequate 
ocular attention when school children are referred for attention: ‘‘When children 
are in a situation where it is likely they will be examined by a reliable practitioner 
making muscle and fusion tests, then all cases failing any single test, with the 
exception of Test 5 (Stereopsis) should be referred. Those failing Test 5 should 
be rechecked in six or eight weeks because failure in this test may be a symptom 
of a disturbance in the ocular reflex which may show up at some later date.” 
Retests for many of the cases were recommended before making the final decision 
regarding those needing a specialist’s attention. 











id 


m 
” 


on 








Betts Tests as a Sorting Device for Use in Schools 245 


room, the Town Hall, aschool auditorium. The children were brought 
for examination in small groups and provision was made so that all 
had an opportunity to observe the procedures before they were tested. 
The fact that all were eager and willing to codperate made it possible 
for the ophthalmologist to examine a comparatively large number of 
cases in one day. The examination! included: 


Visual Acuity, each eye separately and both eyes together 
Interpupillary Distance 


Near point of Convergence 

External Examination 

Ocular Motility 

Pupillary Reactions 

Digital Tension 

Ophthalmoscopy 

Retinoscopy 

Subjective Refraction (Fog test) with use of Snellen Charts and 
Astigmatic Charts 

Distance and Near Phoria test 

Cover test for Phoria 


The criteria? set up by the ophthalmologist for determining the 
recommendations to accompany his reports are not presented here in 
detail. They will appear in subsequent reports prepared for a medical 
journal. After each examination was completed, the ophthalmologist 
made one of the following notations:? (1) No recommendation; (2) 
Examination indicated; (3) Examination if school work warrants; 
(4) Yearly check-up. 

Because of the attendant difficulties in a public survey of this 
nature, no effort was made to examine the original two-hundred cases 
with the use of drops which make the examination easier by inducing 





1 Descriptions of the procedures used in the ophthalmologists examination 
have been mimeographed and may be secured by request from the writer. 

? The writer will be glad to furnish a copy of the criteria upon request. 

3 No recommendation—When refractive error and phoria readings were less 
than amount set in criteria for referring. Examination indicated—When refractive 
error or phoria readings or both corresponded to the amount set in criteria for 
referring or exceeded these limits. Refer if school work warrants—When refractive 
error or phoria reading or both were slightly below the amount set for referring. 
It was felt that if the child was not succeeding in school he should be given the 
benefit of a correction. Yearly check-up—When child was already under the care 
of a specialist but in need of reexamination. 








246 The Journal of Educational Psychology 


a temporary paralysis of the ciliary muscle. (A routine procedure in 
the practice of most physicians, especially in cases of children.) In 
order to determine whether or not differences occur in the recom- 
mendations made by this examiner when he examines with and with- 
out drops using procedures identical with those used in the original 
cases, a group of twenty-five separate cases was examined in both 
ways.! In only one case was the recommendation different: Before 
the administration of the drops, this case was marked ‘‘ Examination 
if school work warrants”; afterwards the recommendation was 
“Examination indicated.” 

The ophthalmologist’s consistency in recommending cases for 
further examination was determined by making a second judgment 
from his original data without knowledge of his first. Of the one 
hundred eleven cases treated in this manner, one hundred four recom- 
mendations were exactly the same. Of four cases in which there was 
no recommendation at the time the ophthalmologist made the exami- 
nation he recommended the child be examined if his school work 
warranted. In three cases he advised an examination at the time he 
made the record, but at the later observation he suggested the examina- 
tion if the child’s school work was not satisfactory. In none of these 
one hundred eleven cases was there a change from an extreme diag- 
nosis to another, as from “‘examination recommended ”’ to ‘‘no exami- 
nation recommended.”’ 

As a check of the ophthalmologist’s consistency in examinations 
of the same cases, he reéxamined twenty-five cases under very favor- 
able conditions within a month, usually without knowing that they 
had been examined before, and always without reference to his first 
record. This recheck revealed no variation in any case greater than 
the limits of the criteria which had been set up and indicates a con- 
sistency of one hundred per cent for the examining ophthalmologist. 
It cannot be inferred from this statement that any ophthalmologist 
can maintain one hundred per cent consistency for larger numbers of 
cases over longer periods of time and under less favorable conditions. 
The evidence, however, appears to indicate that further reéxaminations 
of our groups would not change the recommendations in any signifi- 
cant number of cases. This statement does not imply that reéxamina- 





1 After the routine examination had been made, one per cent homatropin was 
dropped into the conjunctival sac every five minutes for fifteen instillations. 
After waiting until the pupils were fully dilated and fixed, (at least one-half hour, 
usually an hour), the patient was examined again, to determine whether there were 
differences greater than the limits of the criteria being used. 





W 
a 


al 
C 
WwW 
al 
in 


di 
tic 
in 

ca 


ref 


an 
att 
the 


SUI 


ins 
to 


pas 





\- 
Fad 


iS 


re 








Betts Tests as a Sorting Device for Use in Schools 247 


tions may not be necessary in such surveys or that ocular conditions 
of children may not be found to vary upon reéxamination. 

As a check upon the consistency of the instrument the results 
of the two tests (administered by different persons, but in each case 
the retest made by the person who gave the first test) were compared. 
When scored according to the manual, these records show that, 
although the time interval was as great as a year in almost one-half 
of the cases, the conclusions reached concerning referral were the same 
in eighty-one per cent of the cases. This indicates a high reliability 
for the tests which means that they tend to show the same results 
when repeated. 

By way of summary before presenting our results we may describe 
our populations and procedures as follows: 

(1) The subjects consisted of two hundred children, six to fifteen 
years of age. One hundred of these were handicapped in reading or 
were suspected of vision difficulty; the other hundred were selected 
at random. 

(2) All subjects were tested twice with the Betts Visual Perception 
and Sensation Cards (the DB series) administered by examiners as 
competent as those advised by the manufacturer of the tests. Scores 
were obtained by following the prescribed manual of directions and 
also by reference to a scoring system devised by an expert from the 
instrument company. 

(3) The subjects were then examined by an ophthalmologist who 
proved to be one hundred per cent consistent in making extreme 
diagnoses from the same data and who had discovered that examina- 
tion with and without drops in the eyes made only one slight difference 
in his recommendations in twenty-five cases. 

(4) The findings of the ophthalmologist’s examination in each 
case were tabulated to indicate whether or not the child should be 
referred for the attention of an eye specialist. Likewise the results 
of the Telebinocular tests were recorded to indicate which had passed 
and which had not passed and should, therefore, be referred for ocular 
attention. A comparison of the findings from these two sources was 
then made. The results appear in the tables on pp. 248 and 249. 

In general these percentages correspond to findings in other vision 
surveys. They suggest the rough limits to be expected of a testing 
instrument designed to sort out those cases which should be referred 
to a specialist. 

Table II shows that of one hundred selected cases only three 
passed both Telebinocular tests when scored according to the 








248 The Journal of Educational Psychology 


TABLE I.—PERCENTAGE OF Groups A AND B REFERRED ON THE BASIS OF 
OPHTHALMOLOGIST’S EXAMINATION 





Ophthalmologist’s 
recommendations 


Per cent Per cent 
Group A (N 100) | Group B (N 100) 





Refer! to eye specialist............... 
Refer if school work warrants....... ag 


30 | 
6 j 
11 








wOor ns 


53 8 





1“ Refer” means the ophthalmologist recommended that the subject needed 
further attention of an eye specialist; ‘‘ Passed,” that he did not need such attention. 


TaBLE II.—ReEsvutts oF Two TELEBINOCULAR TESTS ScORED ACCORDING TO 
MANUAL AND EXPERT COMPARED WITH RESULTS OF OPHTHALMOLOGIST’S 
EXAMINATION. THE SuBsEcTS ARE ONE HuNDRED ScHOOL CHILDREN, 

AGED NINE TO FIFTEEN, HANDICAPPED IN READING OR SUSPECTED 
OF VISION DIFFICULTY 


























Ophthalmologist’s recommendations 
Telebinocular tests! N proved ‘ Yearly 
Refer? ee check- | Passed? 
work war- 
rants 7 

Failed tests twice: 

According to manual............. 78 26 5 9 38 

According to expert.............. 35 14 2 4 15 
Passed tests twice: . 

According to manual............. 3 3 0 0 

According to expert.............. 19 4 l 1 13 
Passed once, failed once: 

According to manual............. 12 | 0 1 2 9 

According to expert.............. 37 11 5 17 
Questionable once, passed once: | 

According to manual............. 4 0 0 | 0 4 

According to expert.............. 6 1 0 1 4 
Questionable once, failed once: | 

According to manual............. 3 | 1 0 0 | 2 

According to expert.............. 3 | 0 0 0 3 








1 The entire DB series of the Betts Visual Sensation and Perception tests. 
2 “Refer” means the ophthalmologist, recommended that the subject needed 
further attention of an eye specialist; ‘‘Passed,” that he did not need such 


attention. 





in 
ga 
or 

fol 
Im 
cas 
cor 


fur 
atte 


Of 
one 
lar 
twe 








led 
ich 











Betts Tests as a Sorting Device for Use in Schools 249 


manual. These three were all among those referred for examination 
by the ophthalmologist. 

The remaining ninety-seven either failed or were questionable on 
one or both of the Telebinocular tests when scored according to the 
manual. Of these ninety-seven the ophthalmologist passed fifty-two. 


TaBLeE III.—ReEsvuts or TELEBINOCULAR TEsT! ScoRED AccORDING TO MANUAL 
AND KEYSTONE EXPERT COMPARED TO RESULTS OF OPHTHALMOLOGIST’S 
EXAMINATION. THE SuBsEcTS ARE ONE HUNDRED ScHOOL CHILDREN, 

AGED Six TO FIFTEEN, SELECTED AT RANDOM 


























Ophthalimologist’s recommendations 
| , 
Telebinocular test? N wareuby 
Refer school | Yearly Passed! 
work war-| check 
rants 
Failed: 
According to manual............. 74 7 4 3 60 
According to expert.............. 55 7 2 4 42 
Passed: 
According to manual............. 11 0 0 1 10 
According to expert.............. 45 0 2 2 41 
Questionable: | 
According to manual............. 15 0 0 2 13 
According to expert.............. | 0 0 0 0 0 





1 These cases were tested twice, but only one complete test record was available 
in most instances because of errors or omissions made by the teachers or nurses who 
gave the tests. These errors were made in spite of the fact that each teacher 
or nurse had been given the usual instruction in the use of the instrument and had 
followed directions given in the manual. Frequently Test 8 (‘‘Sharpness of 
Image’’) was omitted or improperly recorded by the nurse or teacher. In such 
cases the missing data were obtained by a second examiner, and the subtest scores 
combined to give a complete test record for each case. 

2 The entire DB series of the Betts Visual Sensation and Perception tests. 

3 ‘Refer’? means the ophthalmologist recommended that the subject needed 


further attention of an eye specialist; ‘‘Passed,” that he did not need such 
attention. 


According to the expert nineteen passed both Telebinocular tests. 
Of these the ophthalmologist passed thirteen. The remaining eighty- 
one either failed or were questionable on one or both Telebinocu- 


lar tests. Of these eighty-one cases the ophthalmologist passed 
twenty-nine. 





250 The Journal of Educational Psychology 


Table III shows that of one hundred cases selected at random, 
eleven were passed according to the manual. Of these eleven, ten 
were passed by the ophthalmologist. The remaining seventy-nine 
either failed or were questionable, according to the manual. Of 
these seventy-nine the ophthalmologist passed seventy-three. 

According to the expert, forty-five passed the Telebinocular test. 
Of these forty-five, the ophthalmologist passed forty-one. The 
remaining fifty-five failed the Telebinocular test, according to the 
expert. Of these fifty-five, the ophthalmologist passed forty-two. 

It can be seen that comparisons of these findings indicate both 
qualitative and quantitative disparities. The study points to the 
fact that the Telebinocular test sorts out too many cases for practical 
purposes, and also that it misses cases needing to be referred for ocular 
attention. It answers negatively the question posed at the beginning 
of our study: ‘‘ Does the vision testing material as it is dispensed and 
used by schools serve to screen out the children who should be referred 
to an eye specialist ?”’ 


SELECTED REFERENCES 


Betts, E. A.: Prevention and Correction of Reading Difficulties. Rowe, Peterson & 
Co., 1936. 

Forster, Miriam: ‘‘The Keystone Ophthalmic Telebinocular.”” Commonhealth, 
Vol. xxiv, 1937, pp. 184-188. 
Gates, A. I. and Bond, G. L.: ‘‘ Reliability of Telebinocular Tests of Beginning 
Pupils.” Journal of Educational Psychology, Vol. xxvii, 1937, pp. 31-36. 
Kempf, G. A., Jarman, B. L. and Collins, 8. D.: ‘‘A Special Study of the Vision of 
School Children.”’ Public Health Reports, July 6, 1928, Government Print- 
ing Office, Washington, D. C., pp. 1713-1739. 

Peters, G. A.: ‘‘An Appraisal of Visual Defects of Children in Indiana.” Journal 
of Indiana State Medical Association, Vol. xxx1, No. 5, 1938, pp. 237-238. 





of 
th 
in’ 
(3) 
pr 


me 
are 


as | 
the 
use 
anc 


be 


sim 
ave 
aut. 
atte 
tion 


stuc 
in t 
con 
ones 
soci: 
inve 
(C) 
tion. 
of tl 
perti 
of sa 








vs 


ul 





THE CORRELATION BETWEEN AGE AT ENTRANCE 
AND SUCCESS IN COLLEGE 


PAUL 8S. DWYER 
University of Michigan 


1. INTRODUCTION 


The relation between age and college success has been the subject 
of many investigations during the past quarter of a century. It is 
the aim of this article (1) to present a statement of the results of these 
investigations, (2) to examine the correlations which have been found, 
(3) to show how the subcorrelation technique is useful in studying this 
problem, and (4) to make suggestions for future studies. 

In making statistical studies we must have precise definitions and 
measures. Inthe problem under investigation the important measures 
are those of age and of college success. 

The two usual methods of measuring age are (1) years since birth 
as indicated by last birthday and (2) years since birth as indicated by 
the nearest birthday. The first of these measures seems to have been 
used by most of the investigators. Other measures are (3) age in years 
and months and (4) agein months. Each of these latter measures can 
be reduced to one of the others for purposes of comparison. 

The situation with reference to measures of college success is not as 
simple. The most commonly accepted measure is (A) the academic 
average as weighted by honor points, and this has been used by many 
authors. Other less objective measures of academic success, such as 
attainment of honor grades, election to Phi Beta Kappa, and gradua- 
tion with honors, have been used. 

College success is also measured by (B) the number of semesters the 
student remains in college. This measure has been used frequently 
in the study of age, usually as a supplement to (A). 

Investigations using (A) and (B) have led quite universally to the 
conclusion that the younger students do much better than the older 
ones. The acceptance of this conclusion leads to a consideration of the 
social adaptation of the younger group. This question has been 
investigated by many authors either by using an objective measure, 
(C) disciplinary action or (D) miscellaneous measures of social adapta- 
tion. In presenting a summary of previous investigations, the results 
of those using (D) are not reported, as the methods used are not 
pertinent to the methods of this paper. It appears that the problem 


of social adaptation of young students warrants a further summary of 
251 








252 The Journal of Educational Psychology 


the previous investigations as well as the collection of new material 
from various institutions. 

The method of attacking the problem must be decided upon as soon 
as the basic measures are determined. The most common attack is 
(1) the use of the age as the independent variable, the scholastic 
success as the dependent variable, and the computation of the average 
scholastic success for the different ages. A second method (2) results 
in an interchange of the two variables and the computation of the 
average of those falling in different classes. Another method (3) calls 
for the computation of the Pearson product-moment correlation coeffi- 
cient as a measure of the extent of the relationship between the two 
variables without making any implication as to which variable is 
independent. | 

Since the results of studies carried on according to these methods 
show the quite universal agreement that the younger students excel, it 
is logical to study (4) the scholastic success of the young student. It is 
evident that (4) is a special case of (1). 

Similarly it is sensible to study (5), the age of “‘successful’”’ stu- 
dents. This method, which is a special case of (2), has been used by a 
number of authors. 

It is possible, too, to find (6) the correlation coefficient connecting 
age and scholastic success for the ‘‘young”’ group and the ‘‘old’’group 
separately. This method, which is a special case of (3), does not seem 
to have been used extensively previously. It is used in the later parts 
of this article in presenting a better picture of the extent of relationship 
existing between age and success in college. 


2. SUMMARY OF EARLIER WORK 


In the brief summary of previous investigations, which is included, 
the numbers and letters found in the preceding paragraphs are used to 
indicate the measures of college success and the methods used. No 
indication is given of the measure of age, as such information would be 
of no real advantage and would only serve to make the notation more 
complicated. Three entries accompany the reference to each investiga- 
tion.’ The first entry refers to the bibliographical list, the second 
indicates the measure of scholastic success, and the third indicates the 
method used. For instance, in the next paragraph, the entries (1:A,3) 
indicate that the reference is given by bibliographical reference 1, 
that the measure of academic success is college record (A), and that the 
method is that of the coefficient of correlation (3). The references are 
arranged in approximate chronological order. 





v 
st 
W 


W 


the 
low 
The 


uni 
bet 
the 

achi 


hun 


abili 








ynd 
| 3) 


the 
are 





Correlation between Age at Entrance and Success in College 253 


Forsyth (1:A,3) appears to be the first who used the correlation 
coefficient in finding the relationship between age and college success. 
He used data obtained from thirteen hundred six men and six hundred 
forty-four women registered at the University of Illinois in 1909-1910. 
It appears that all classes—freshmen, sophomores, juniors, and seniors 
—were represented, since there is a specific statement that all the 
women enrolled in the university were included. Law students and 
students of agriculture were included. The measure of college success 
was academic average rather than academic average weighted by honor 
points. All grades received from military training or physical training 
were not used. The following correlations were found: 


r = .0938 + .0185 for men. 
r = .1996 + .0260 for women. 


Forsyth made the comment, “It is an interesting fact that for both 
sexes, cases of extremely isolated and advanced ages showed excellent 
averages almost invariably.” 

Holmes (2:A,4:C,4) made a study of those less than seventeen 
years of age who entered Harvard during the interval 1902-1912. 
Using the amount of disciplinary action and graduation honors as 
different measures he came to the conclusion, ‘‘ The college gets better 
results with less friction, from younger men than from older men.” 
He found a general decrease in college success from the youngest group 
up to the age of twenty-one. 

A. L. Jones (3:A,1) studied the scholastic records of two hundred 
eighty-seven freshmen of all ages entering Columbia College in 1915-— 
1916. He found a uniform pattern of decreasing grades from fifteen 
to twenty-two with the single exception that the fifteen-year-olds had 
an excess of F’s. 

Pittinger (4:4,1:A,2) studied the men and women students entering 
the University of Minnesota in 1910 and 1911. He found that the 
lowest grades were made by those nineteen to twenty-two years of age. 
The older students did somewhat better, but the trend was not 
uniform. The students who were under eighteen years did much 
better and stayed in college longer. The same tendency was shown by 
the women, but not to the same extent. He attributed the superior 
achievement of the young group to superior mental ability. 

Terman (5:A,3) reported in 1921 a study of the records of four 
hundred ninety-nine Stanford students and found a correlation of 


—.18. He also found the the younger entrants had more native 
ability. 





254 The Journal of Educational Psychology 


Husband (6:A,4:B,4) reported the records of two hundred twenty- 
four students less than seventeen years of age, who entered Dartmouth 
College (1901-1922). He found large percentages of them successful 
on the basis of graduation, reception of honors, election to Phi Beta 
Kappa. He, too, found high mental alertness. 

Thornberg (7:A,1) studied the records of freshmen entering the 
State College of Washington in 1921 and 1922. His results showed the 
now familiar pattern of low averages for those aged nineteen to twenty- 
one, while those aged seventeen and twenty-two had higher averages. 

M. O. Wilson (8:A,3) studied two hundred fifty freshmen women 
entering the University of Oklahoma in 1923 and found a correlation of 
—.030 + .043 between age and grade while correlation between age 
and “‘brightness”’ gave —.123 + .041. 

Odell (9:A,1) reported a wide survey of nearly two thousand stu- 
dents entering different colleges in Illinois in 1924. He found a correla- 
tion of —.23. He also computed correlations between age and grades 
in various subjects. About half of these were too small to be sig- 
nificant. The significant ones were all negative. 

Zeigel (10:A,5) reported in 1927 regarding the high-school honor 
students entering the University of Missouri. He found the average 
age of this group one and six-tenths years less than that of an unselected 
group of freshmen. 

Cooper (11:B,1) published a report in 1928 in which he analyzed 
the records of eight hundred seventy-one freshmen entering the 
University of Texas. He formed groups on the basis of early elimina- 
tion from college and found the average age of each group. The 
results favored the younger students. 


Gowan and Gooch (12:A,3) investigated the records of nine — 


hundred twenty-seven graduates of the University of Maine who 
entered in the interval 1909 to 1917. They made correlations of age 
with grades in different types of subject-matter and found some posi- 
tive and some negative correlations. Most of the correlations were 
not large enough to be significant. 

Bear (13:A,1) examined the records of one hundred seventy-two 
freshmen at Center College in 1925-26. He found the standard 
pattern, decreasing averages for ages fifteen to twenty-one and increas- 
ing averages for ages twenty-one to twenty-five. 

E. M. Lloyd-Jones (14:A,1) described the situation at North- 
western University as indicated by the freshmen (Liberal Arts) 
entering classes of 1925, 1926, 1927. She found that those less than 





the 


of 
stu 
firs 


Un: 


192 
as j 


Age ; 
Age ; 


hunc 








oO 


1€ 


he 


ne 


ho 
ize 
)sI- 
ere 


wo 
ard 
2aS- 


rth- 


rts) 
han 


Correlation between Age at Entrance and Success in College 255 


seventeen had highest averages, those seventeen to nineteen came next, 
while those twenty and up had the smallest averages. 

Ruth Brown (15:A,1:A,3) examined the age effect at the Uni- 
versity of Michigan as illustrated by one thousand three hundred 
twelve students entering in 1927. She found a correlation of —.11 
and indicated that the men show a higher degree of correlation of grade 
with age than do women. 

Gray (16:A,4) made a thorough study of one hundred fifty-four 
students entering Barnard College while less than sixteen years of age. 
He made his report in 1930 and stated, among other things, that the 
young student group is “superior in intelligence, superior in scholarship, 
has fewer failures, and has more academic honors than does the 
remaining student group.”’ 

An interesting report is that given by Klein (17:B,1) in 1930 regard- 
ing a survey of graduates and non-graduates of a large number of 
land-grant colleges and universities. He divided the groups (grad- 
uates and non-graduates) according to enrollment in schools of Agri- 
culture, Engineering, Home Economics, Arts and Science, and 
Education. He found that the non-graduates enter college at later 
ages than do the graduates. 

Crawford (18:A,3) reported in 1930 the coefficient of —.37 for 
those entering Yale University under ‘admissions plan A.”’ 

Payne (19:A,4) studied the young freshmen entering the College 
of the City of New York in 1929 and 1930. He found that “‘ young 
students are more intelligent, are better college material, and after the 
first term fit into college routine much better than older students.”’ 

Manson (20:A,1:A,3) made a detailed study of the situation at the 
University of Michigan as indicated by the entering freshmen of 1926, 
1927. She computed correlation coefficients for different subgroups 
as indicated by year of entrance and college registration. 





Literary | Engineer- Archi- 
college ing tecture 





1926 | 1927 | 1926 | 1927 | 1926 | 1927 








Age and first semester grades.............. —.11);—.07|—.09);—.13) .02) .05 
Age and second semester grades........... — 08 — .03}—.08) .03;—.14|/—.03 




















Harris (21:A,3) studied a relatively homogeneous group of four 
hundred fifty-six entering the College of the City of New York in 1929. 








256 The Journal of Educational Psychology 


He found a correlation coefficient of —.16. His article includes a 
good bibliography. 

Remmers (22:A,5) reported in 1931 the results of his study of 
five hundred thirty-one distinguished students (upper one-sixth) at 
Purdue University. He found the average age of the distinguished 
students to be approximately equal to that of other groups, but he 
also found that the distinguished group was composed of two sub- 
groups, students who were younger and students who were older than 
the average. 

Odell reported a study in 1933 (23:A,3:B,4). He studied the Illi- 
nois students of age sixteen to twenty-one and found a correlation of 
—.24 + .01. He studied the group age sixteen to seventeen and 
reported that the students stay longer and that larger percentages 
graduate. 

A study by Moore (24:4) dealt primarily with the young high- 
school graduate rather than the young college entrant. A summary 
of previous studies was reported and a good bibliography provided. 
His study is -closely allied to the Pennsylvania testing program. 

F. 8S. Beers (25:A,1) has recently used the scores in the national 
sophomore testing program as a measure of college success. He 
found the basic pattern described above though the break appeared 
about age twenty-three or twenty-four, since the students were 
sophomores and not entering freshmen. Beers presented in graphical 
form the variations in test scores on general culture, general mathe- 
matics, general science, contemporary affairs, literary acquaintance, 
and English. The chief variations from the general scholastic record 
pattern described above resulted from low scores in contemporary 
affairs by the youngest group and a decrease, rather than an increase, in 
scores on general mathematics and general science by the oldest group. 

The general pattern seems to be clear, but the variety of methods 
makes specific comparison of results almost impossible. A summary 
measure of extent of relationships is needed. It is usual to use a 
Pearsonian coefficient for this purpose. Those who have used it 
have found quite consistent results. The correlation for men is 
usually a significant negative quantity, while the correlation for women 
is usually a negative correlation though not as large as that for men. 
The decided exception is the study of Forsyth (1). 


3. UNIVERSITY OF MICHIGAN RESULTS 


Under the direction of Vice-President C. 8. Yoakum, the Office of 
Educational Investigations of the University of Michigan has been 





SS a~ CC ce ok an aCe 


al 


192 
192: 
192: 
1925 
192: 
192¢ 
192¢ 
193¢ 








is 
en 
on. 


» of 
een 


Correlation between Age at Entrance and Success in College 257 


collecting data during recent years on many aspects of the problem 
of the prediction of college success. One of the variables which has 
been regularly recorded is that of age. A large number of correla- 
tion coefficients have been computed which show the relationship 
between age and scholarship. These have been taken from the files 
and are given in tabular form below. The results for the entering 
classes of 1926 and 1927 have previously been reported in the Brown 


(15) and Manson (20) studies. Insignificant coefficients, as judged 
by the formula 


1 —r? 


VN 





) 


3PE, = 3(.6745) 


are inclosed in parentheses. 


TaBLE I.—TuHE CORRELATIONS BETWEEN AGE AND First SEMESTER AVERAGE 
FOR DIFFERENT GROUPS AT THE UNIVERSITY OF MICHIGAN DURING 
THE PERIOD 1928-1933 





| 
School 1928 1929 1930 | 1931 1932 1933 





Ee ee ree —.168 |—.211| —.137 |—.223 





Literary science and arts (men)| —.134 |(—.075)|—.252 |—.178| —.174 |—.155 
Literary science and arts 

ee oe (.029)| (.062)| (.048)|—.146\(—.044)|—.117 
Literary science and arts 

tivated es ahtadd veeneee saaieeseads 


ivaeens —.170} —.139 |—-151 





Total all schools 














eee ronan — .093 |(—.037)|—.133 — .096 — 122 











TABLE II.—CoRRELATIONS BETWEEN AGE AND ACADEMIC RECORD FOR STUDENTS 
ENTERING THE UNIVERSITY OF MICHIGAN IN THE PERIOD 1927-1930 


























Freshman Sophomore Junior 
First Second First Second First | Second 

semester | semester | semester | semester | semester | semester 
1927 total...... —.108 | 
1928 men....... — .134 —.158 
1928 women... .| —(.029) | (—.060) 
1928 total...... — .093 —.137 | (+.002) | (—.052) | (—.037) | (—.036) 
1929 men....... (—.075) | —.096 
1929 women....|( .062) | ( .008) 
1929 total...... | (—.037) | —.072 | (—.036) | (—.014) | 
1930 total...... | —.133 — .097 | 











258 The Journal of Educational Psychology 


Information is also available at the University of Michigan with 
regard to the size of the correlation coefficient between age and 
various semester averages. These results are presented in Table II. 

An analysis of the Michigan correlation charts reveals the pattern 
described by those who have not used the method of correlation. The 
correlation coefficients also are in general agreement with those pre- 
viously reported. The coefficients seem to be within the range indi- 
cated by the results of Crawford (18) and Gowan and Gooch (12). 
They appear to be definitely out of line with the results given by 
Forsyth (1). 

It would appear that the correlation between age and scholarship 
is significantly negative for men’s groups and for total groups at 
Michigan. The corresponding correlation for women, though usually 
negative, is hardly significant. It also appears that the correlation 
becomes smaller in absolute value as the class progresses through col- 
lege, although the results above are not complete enough to warrant 
any definite conclusion. Even if this is typical of Michigan, it may 
not be typical of other institutions. Future investigations may 
clarify this point. 


4. THE USE OF THE SUBCORRELATION COEFFICIENT 


It is evident that the Pearsonian coefficient is not a proper sum- 
mary measure of the correlation between age and scholastic success. 
Almost all authors report that the “young” group and the “old”’ 
group have higher averages than the “middle” age group. This 
describes a non-linear pattern. The correlation ratio could be used 
as a proper measure of relationship in this case. It has not been 
generally used so far in studying this question. The awkwardness 
of computation and the fact that it does not provide a linear predic- 
tion equation prejudices one against its use. It is here suggested that 
‘“‘subcorrelation’’ may be properly used to describe objectively the 
correlation which exists between age and scholastic success. 

The Michigan correlation charts show quite universally two linear 
trends, one from sixteen to twenty-one and the other from twenty-two 
up. It appears logical to compute the correlation coefficient for the 
group sixteen to twenty-one and for the group twenty-two up 
separately. These two coefficients give a fair picture of the situa- 
tion within the correlation chart. Two regression equations can be 
worked out and applied, each to its proper age group. 

It is possible to compute these subcorrelations and the total cor- 
relation coefficient from the same correlation chart. It is only neces- 





TABLE III.—Scuovastic REcorpn penmeese Bae eee CH oe 





























































































































260 The Journal of Educational Psychology 


sary to form the columns f, Zfd., (=fdz)dy, fdy, fdz”, fd,?, and to substi- 
tute in the formula 


i fxfd.d, — (2fdz)(Zfd,) 
Vifafde?) — (2fde) MF2hd,2) — (Zfdy)? 


when the columns are summed for all rows whose class marks are 
hs, Be, . s « Be 

The method is illustrated in Table III where the correlation coeffi- 
cient and two subcorrelations are computed for the students entering 
all schools of the University of Michigan in 1930. The ‘“‘subcorrela- 
tions’’ are the underscored items in the computational rows at the 
bottom of the table. 


TABLE 1V.—SvuBCORRELATIONS FOR DIFFERENT GROUPS ENTERING THE 
UNIVERSITY OF MICHIGAN IN 1930 





r(Riky, ke, +» » ke) 

















Literature, | Literature, 
Engineering | science and | science and | All schools 

arts, men | arts, women 
| SS ear t 277 567 303 1376 
cee eek ae awd eee — .168 — .252 .048 — .133 
isd eka mahewa ee 2.48 2.52 2.57 2.63 
ee ere ee eee ere 9 14 11 54 
| .181 . 232 .723 . 222 
io | rr .351 .353 .871 .340 
| og & 8 2.24 1.28 3.22 3.51 
ia deke ase hou kas 268 553 292 1322 
| — .180 — .259 .116 —.179 
| — .232 — .323 .168 — .244 
S:7(:0-6)........66-- 2.47 2.52 2.37 2.55 

















It appears that r = .222 should be applied to the group composed 
of those aged twenty-two years and up while r = .179 should be applied 
to the group composed of those of age twenty-one years and less. 

It should be noted that the values of the subcorrelations, r = .222 
and r = .179, are not comparable with the value of the correlation 
coefficient, since the range has been restricted. The restriction of 
range caused by the selection of a subgroup of consecutive rows (or 
columns) tends to lower the value of r. The increased values of the 
subcorrelations above are not due to the restriction in scale, since that 


would tend to decrease them, and hence they are due to much better 


fit. 





re >, afk, Oh —s 


== ££ Ag oe 


al 


de 
th 
It 


pre 
th: 


dat 
tio 
Six 
ent 
ing 
doe 
illu: 
192 
rece 
to g 
ing 

first 
in T 
grou 











Correlation between Age at Entrance and Success in College 261 


It is possible with certain assumptions to correct the subcorrela- 
tions in such a way that they may be compared with the original cor- 
relation. ‘The reader is referred to (26) for the development of the 
theory and to (27) for an illustration. The computational entries in 
the bottom rows of Table III follow the same plan as the illustration 
of (27). The corrected subcorrelations in Table III are r’(R:7,8,9) = 
340, 1’(R:0-6) = —.244 with §?(R:7,8,9,) = 3.509; S?(R:0-6) = 
2.545, and S? = 2.626. It is apparent in this case that there are two 
distinct trends. Also the group composed of 7,8,9 has somewhat 
larger deviations from the trend line. 

Charts showing corresponding information for those entering the 
College of Engineering in 1930, the men entering the College of Litera- 
ture, Science, and the Arts in 1930, and the women entering the College 
of Literature, Science, and the Arts in 1930 were analyzed similarly. 
The results are given in Table IV. 


It appears, in all cases, that the two definite and distinct trends 
are present. 


5. THE FORSYTH RESULTS 


The results given by Forsyth (1) are the only ones which are 
decidedly out of line with the results of the various studies. It appears 
that no one has given a satisfactory explanation of the reason for this. 
It is possible that the situation at Illinois a quarter of a century ago 
may have been different than it is now, but this does not appear 
probable. Dean Holmes found the usual pattern at Harvard at about 
that time. 

Another key to the explanation may be the fact that the Forsyth 
data were obtained from students from all classes. If so, the correla- 
tion pattern would be different since fifteen-year-old freshmen, 
sixteen-year-old sophomores, seventeen-year-old juniors would be 
entered in different age groups. The use of this heterogeneous group- 
ing in studying age correlation gives a more positive correlation than 
does the method of the more homogeneous entering groups. As an 
illustration the correlation charts of the Michigan entering class of 
1928 (junior first semester record), 1929 (sophomore first semester 
record), and 1930 (freshmen first semester record) were superimposed 
to give a picture of the relationship between age and scholastic stand- 
ing for the group of freshmen, sophomores, and juniors enrolled in the 
first semester of 1930-1931. The individual correlations (as given 
in Table II) were all negative and yet the correlation of the combined 
group was r = +.010 + .012. The inclusion of senior records would 


262 The Journal of Educational Psychology 


perhaps have made this correlation a significant positive quantity. 
The positive size of the coefficient might also be enlarged by an admis- 
sions policy which prevented the entrance of young students or the 
admission of a larger percentage of older students. 

Forsyth apparently used a cross section of all classes so his positive 
correlations do not necessarily contradict the negative correlation 
which is usually found when the more homogeneous class group is 
used. It must be said, however, that the comparable Michigan results 
do not appear to give as large positive coefficients as Forsyth found. 
Information from other schools with reference to the size of the 
coefficient obtained by a cross section of all classes would serve to 
establish the consistency or inconsistency of the Forsyth report with 
standard results. 


6. SUMMARY AND CONCLUSION 


It thus appears that— 

(1) There is quite universally a negative relationship between age 
and freshman scholastic success which is usually measured by a cor- 
relation coefficient of from .00 to =.25. 

(2) This relationship can be analyzed further into two trends. 
There is a negative trend up to the entering age of twenty-one and a 
positive trend from age twenty-two. The subcorrelations measur- 
ing these trends are slightly larger in absolute value than are the 
correlations. . 

(3) This relationship appears to continue for a given entering group 
with perhaps a tendency for the coefficients to decrease. 

(4) The tendency described is much more pronounced for men than 
for women. 

(5) In no case is the absolute value of the coefficient large enough 
so that it can be used as the basis of individual prediction, although 
it is frequently sufficiently large to warrant the use in making predic- 
tions by age groups. 

With reference to the subsequent use of age in the prediction of 
college success, it is suggested that age measurement should be applied 
as a supplement to the more useful criteria of college success such as 
high-school record and measure of intelligence. This should be done 
(1) through partial correlation theory or (2) through the use of age 
as a basis for the selection of more homogeneous groups. If the latter 
method is used it would appear that the group should also be composed 
of members of the same sex. Thus it would be possible to study the 





en | jr = 


~~ 


10. 


i], 


12. 








ne 
ge 
ter 
sed 
the 





Correlation between Age at Entrance and Success in College 263 


relationship between high-school record and college record of all men 
less than eighteen years of age. 


This method, which avoids the artificiality of the partial correla- 
tion method, is simple in structure and directly adapted to the use of 
tabulating machines. It cannot be used effectively by small institu- 
tions as the various homogeneous subgroups do not provide large 
enough samples. It has been used advantageously at the University 
of Buffalo (28) (29) where the investigators have been able to find 
appreciably higher correlations for certain homogeneous subgroups. 
This method seems to provide the simplest and the most effective 


means by which the moderate correlation of age with scholastic success 
may be made useful in prediction. 


BIBLIOGRAPHY 


1. Forsyth, C. H.: “Correlation Between Ages and Grades.”’ Journal of Educa- 
tional Psychology, Vol. 111, March, 1912, p. 164. 

2. Holmes, H. W.: “‘ Youth and the Dean: The Relation Between Academic 
Discipline, Scholarship, and Age of Entrance to College.’”’ Harvard Gradu- 
ates Magazine, Vol. xx1, 1913, pp. 599-610. 

3. Jones, A. L.: “‘College Standing of Freshmen of Various Ages.’”’ School and 
Society, Vol. u1, May 13, 1916, pp. 717-720. 

4. Pittinger, B. F.: ‘The Efficiency of College Students as Conditioned by Age 
at Entrance and Size of High School.’”’ Sixteenth Yearbook of the National 
Society for the Study of Education. Bloomington, IIl., Public School Pub. 
Co., 1917, Part II, pp. 9-112. (Esp. pp. 35, 55-97, 111.) 

5. Terman, L. M.: “Intelligence Tests in Colleges and Universities.’’ School and 
Society, Vol. x111, April 23, 1921, pp. 481-494. 

6. Husband, R. W.: ‘‘Studies in Student Personnel at Dartmouth.’”’ Journal of 
Personnel Research, Vol. 11, 1923, pp. 70-79. 

7. Thornberg, L. H.: ‘‘College Scholarship and Size of High School.’’ School and 
Society, Vol. xx, August 9, 1924, pp. 189-192. 

8. Wilson, M. O.: “The Intelligence and Educational Achievement of Two 
Hundred Fifty Freshmen Women of the University of Oklahoma.’’ School 
and Society, Vol. xx1, June 6, 1925, pp. 693-694. 

9. Odell, C. W.: Predicting the Scholastic Success of College Freshmen. Urbana, 
Ill.; University of Illinois, 1927, 54 pp. (Bureau of Educational Research, 
College of Education, No. 37.) Esp. pp. 28, 29. 

10. Zeigel, W. H.: ‘‘ Achievements of High-School Honor Students in the Univer- 
sity of Missouri.’”’ School and Society, Vol. xxv, Jan. 15, 1927, pp. 82-84. 

11. Cooper, L. B.: ‘‘A Study in Freshman Elimination in One College.” Nation’s 
Schools, Vol. 11, September 1928, pp. 25-29. 

12. Gowen, J. W., and Gooch, Marjorie: ‘‘Age, Sex, and the Interrelations of 
Mental Attainments of College Students.” Journal of Educational Psychol- 
ogy, Vol. xvu1, March, 1926, pp. 195-207. 


264 The Journal of Educational Psychology 


13. Bear, R. M.: “Factors Affecting the Success of College Freshmen.” Journal 
of Applied Psychology, Vol. x11, October, 1928, pp. 517-523. 

14. Lloyd-Jones, E. M.: Student Personnel Work. New York: Harper and Broth- 
ers, 1929, 253 pp. (Esp. pp. 56, 57, 141, 150.) 

15. Brown, Ruth A.: A Comparison of Data on Freshmen Entering the University 
of Michigan in the Fall of 1927. Ann Arbor, Michigan. University of 
Michigan, 1930, 94 pp. (University of Michigan Administrative Studies, 
I, No. 1.) 

16. Gray, H. A.: Some Factors in the Undergraduate Careers of Young College 
Students. New York: Teachers College, Columbia University, 1930, 66 pp. 
(Teachers College, Columbia University Contribution to Education, No. 
437.) 

17. Klein, A. J.: Survey of Land Grant Colleges and Universities, Washington: 
Superintendent of Documents, 1930. Vol. 1, xxvu1, 998 pp. (Esp. pp. 
350-352.) (U.S. Bulletin, 1930, No. 9.) 

18. Crawford, A. B.: ‘Forecasting Freshman Achievement.’ School and Society, 
Vol. xxx1, January 25, 1930, pp. 1-8. 

19. Payne, A. F.: ‘‘An Experiment in Human Engineering at the College of the 
City of New York.”’ School and Society, Vol. xxx11, August 30, 1930, pp. 
292-293. 

20. Manson, Grace: An Investigation of Some Problems Involved in Student Selec- 
tion at the University of Michigan. Unpublished treatise on file at the Office 
of Educational Investigations of the University of Michigan. 

21. Harris, D.: ‘“‘The Relation to College Grades of Some Factors Other Than 
Intelligence.” Archives of Psychology, No. 131, July, 1931, pp. 20, 40, 48. 

22. Remmers, H. H.: ‘‘Some Attributes of Superior Students.”’ Personnel 
Journal, Vol. x, 1931, pp. 167-178. 

23. Odell, C. W.: ‘The Effect of Early Entrance Upon College Success.”” Journal 
of Educational Research, Vol. xxvi, 1933, pp. 510-512. 

24. Moore, M. W.: A Study of Young High-School Graduates. Teachers College 
Contribution to Education, 1933, No. 583. 

25. Beers, F. S.: ‘‘The Human Side of This Testing Business.’’ Educational 
Record, Vol. xvu1, No. 4, October, 1936, pp. 601-602. 

26. Dwyer, P. S.: ‘‘The Use of Subcorrelation in the Analysis of Non-linear and 
Non-Homoscedastic Correlation Charts.”” Journal of Educational Psychol- 
ogy, Vol. xxviu1, 1937, pp. 541-547. 

27. Dwyer, P. S.: ‘‘The Use of Subcorrelation in Determining the Predictive 
Power of High-School Grades.”’ Journal of Educational Psychology, Vol. 
xxvill, 1937, pp. 673-680. 

28. Wagner, M. E., and Strabel, E.: ‘‘Homogeneous Grouping as a Technique for 
Improving Prediction Coefficients.” School and Society, Vol. xu, 1934, pp. 
887-888. 

29. Wagner, M. E., and Strabel, E.: ‘‘Homogeneous Grouping as a Means of 
Improving the Prediction of Academic Performance.” Journal of Applied 
Psychology, Vol. x1x, 1935, pp. 426-446. 





QO & TD -— we we ee OU 


Fe) 


lI 


L 
as 
th 
SU 


nc 
in 
su 
M 
ret 
ret 
wa 
on 


nig 
Ho: 








al 


nd 
al- 


ve 
ol. 


for 
pp. 


; of 
lied 


THE RELATIONSHIP BETWEEN INTELLIGENCE AND 
THE RETENTION OF COURSE MATERIAL IN 
INTRODUCTORY PSYCHOLOGY AFTER 
LENGTHY DELAY PERIODS 


ROBERT I. WATSON 


University of Idaho, Southern Branch 


The problem of the relationship of intelligence to retention has 
received but scanty and incidental attention in the literature. In most 
instances wherein experimental results have been reported, retention 
for short delay periods only have been studied. In addition, only one 
previous worker (Lee”) has examined this relationship, using the 
methods of recognition and recall both immediate and delayed for the 
same subjects on similar materials. In the forementioned study 
the delay period measured was twenty-four hours. In view of these 
considerations, after a brief review of the relevant studies, the writer 
will describe an investigation of the relationship between intelligence 
and recognition, and intelligence and recall, for delay periods extending 
over forty-six months for material learned in connection with a course 
in introductory psychology. 


THE LITERATURE 


The literature on this problem prior to 1925 has been reviewed by 
Lee,*? who showed that previous to her investigation, estimates served 
as measures of intelligence. In general two techniques were followed; 
the correlational and the comparison of supposedly discrete groupings, 
such as the normal and feebleminded. For further details the reader 
is referred to her monograph.”°* 

Certain of the later studies although of interest and importance are 
not relevant to the present discussion, since they did not include exam- 
ination of both immediate and delayed retention. In investigations 
such as those of Garrett,!° Kennedy, Foster,’ Powers,** Hegge,'* 
Maiti,??, Hurlock and Newmark, and McElwee?* only immediate 
retention was studied. Since Bolton‘ averaged scores for immediate 
retention and scores on a comparable form one week later, her study 
was also one of immediate retention. The study of Layton’ contained 
only correlations between intelligence and delayed retention. 





* In addition to those studies reviewed by her the following also used the tech- 
nique of estimating intelligence: Bell, Carey,® Carothers,® Guillet,!? King and 
Homan,” Kitson,’ Lyon,?! Sharp,” Travis,2° and Winch.” 

265 





266 The Journal of Educational Psychology 


The studies of Jones,!® Lee,” Anderson and Jordan,' Bassett,? 
Dietze,? White, and Grant!! compared both immediate and delayed 
retention in their relation to intelligence and, therefore, will be given 
more detailed consideration. 

Jones,!® who studied the recall of psychology by college students, 
found identical average rank-order correlations of .35 between intelli- 
gence and immediate recall and intelligence and delayed recall. How- 
ever, the longest delay interval was eight weeks. 

Lee” studied the relationship between recognition and intelligence 
and recall and intelligence using similar materials, consisting of pic- 
tures, words, forms and syllables, and the same three hundred ninety 
public-school children for both methods of retention measurement 
after thirty seconds, twenty-four hours, and then by relearning. The 
average correlation of recognition and intelligence is reported as .26 
and the average correlation of recall and intelligence as .39. When 
immediate retention (retention after thirty seconds) and retention 
after twenty-four hours are compared, it is probable that the correla- 
tions for delayed and immediate retention are not significantly differ- 
ent in magnitude. In four instances that for delayed retention is the 
larger; in an equal number that for immediate retention is greater. 
The correlations of each sort of material with intelligence test scores are 
such that three of the four average correlations for recognition tests are 
higher than the lowest of the four average correlations for recall 
tests, and one recognition average is higher than two of the recall 
averages. When recognition and recall of similar materials are com- 
pared, however, in three out of four instances the average correlations 
of intelligence and recall scores are the higher. 

In their study of the recall of Latin words and phrases on the part of 
thirty seventh-grade children Anderson and Jordan! found a correla- 
tion of .65 between intelligence and immediate recall, and a correlation 
of .58 between intelligence and delayed recall after eight weeks. 

In her study of the retention of history using combined recognition 
and recall scores Bassett? reported correlations between retention and 
intelligence for seven hundred seventy-six grammar-school children for 
delay intervals ranging from none to sixteen months. The correlations 
ranged from .17 to .71 with an average of .37. Although she does not 
comment upon it, her results show that there was a tendency for the 
relationship to decrease as the delay increased. In nine out of eleven 
instances the correlations for immediate retention and intelligence are 
larger than those for the longest delay, for which each of the eleven 





ati 


for 
Sor 
the 
tes 
be 
anc 
san 
rec 
pos 


lite 








yn 


nd 


ns 
10t 


ren 
are 
yen 





Intelligence and the Retention of Course Material 267 


groups was later measured. In the same number of instances, imme- 
diate retention measures have the largest correlation with intelligence, 
irrespective of the delay intervals for which each group was later 
measured. 

Dietze,? who measured recognition memory of factual substance 
material, read a single time on the part of nine hundred eighty-three 
junior and senior high-school pupils, reported average correlations 
with intelligence of .55 for immediate memory, .43 for memory after 
one day, .51 for memory after fourteen days, .45 for memory after 
thirty days, and .25 for memory after one hundred days. 

Using about two hundred fifty high-school students as subjects, 
White* investigated the retention of algebra measured by combined 
recognition and recall scores. On one test the correlations between 
retention and intelligence were .35, .13, .19 and .16 for delay intervals 
of none, three, nine, and fifteen months. On a second test the corre- 
lations were .40, .40, .40, and .42, respectively, for delays of none, five, 
eight, and sixteen months. 

Grant!! investigated recall on ten rote and meaningful tests on the 
part of about one hundred fifty college and academy students immedi- 
ately, and after delay intervals of one day, one week, and four or five 
weeks. Among the various tests for retention, three are especially 
pertinent because they consist of meaningful materials while the others 
measure rote memory. The correlations for the delay intervals 
mentioned above between poetry and intelligence are .59, .57, .57, and 
49. In one test of prose description they are respectively .58, .53, 
.57, and .51; and in another, .24, .45, .29, and .39. 

Certain additional summary statements seem appropriate in evalu- 
ating these studies. In five studies, 7z.e., those of Jones,!® Lee,” 
Anderson and Jordan,! Bassett,? and Grant!! no reliability coefficients 
for the memory tests were reported. This makes adequate evaluation 
somewhat difficult. In the studies of Lee,”® and Anderson and Jordan! 
the National Intelligence Test was used which contains a code learning 
test, presumably making for a higher correlation than otherwise would 
be the case. Only one study, that of Lee” measured both recognition 
and recall memory separately on tests similar in content and on the 
same subjects. To be sure, Bassett? and White” included both 
recognition and recall items but they were pooled and a single com- 
posite retention measure was secured. 

It is apparent that there is no information to be secured from the 
literature on the relationship of immediate and delayed recognition 








268 The Journal of Educational Psychology 


and recall to intelligence, on tests similar in content on the same 
subjects, and in which the retention tests were shown to be reliable. 
In addition, no study of any sort whatsoever examined the relationship 
of retention to intelligence for delay intervals longer than sixteen 
months. 


SUBJECTS, PROCEDURE AND CONDITIONS 


While investigating the retention of material learned in connection 
with a required course in introductory psychology, the present writer 
had occasion to collect data in regard to the intelligence of the subjects. 
Certain pertinent aspects of this previous study” will, therefore, be 
summarized. 

A group of college students were tested for immediate recognition 
and recall by typical examination questions in psychology. After a 
lapse of various periods of time, during which the subjects neither 
reviewed the material nor took advanced courses in the field, they were 
examined for delayed retention on the same tests. Both testings 
took place in the classroom. 

A description of the several subgroups, examined in the present 
paper, is given in Table I, in terms of the group designations hereafter 
used,* the number of cases, and the delay intervals measured in 
months. Each subject was measured for delayed retention for three 
periods, a test of a different sort being used for each interval, although 


TABLE I.—DESCRIPTION OF THE SUBGROUPS 











Retention intervals measured in months 
Number of 
Group cases 
Test I Test II Test III 
1 13 2 4 8 
2 20 6 8 10 
3 13 18 20 22 
5 26 42 44 46 
7 12 

















all testing took place on one occasion. This procedure was made 
possible by the fact that immediate retention scores for three different 
previous occasions were available. Test I was the objective portion 





* Groups 4 and 6 of the previous study were eliminated because they contained 
too few subjects to make the correlation technique advisable. 





rer 
Al 
col 
tio 
fac 
nu 
S10 
anc 
the 
me 
fift 
fifte 


tior 
Sco} 
the 
72 
tha) 








= = 


_—_—_ 
a 


ide 
ont 
ion 


ned 





Intelligence and the Retention of Course Material 269 


of the final examination for the second semester; Test II covered 
material drawn from Allport’s Social Psychology which was not sub- 
sequently touched upon in class, 7.e., the student received no lectures 
or discussions on the material, merely reading it independently outside 
of class, and Test III was the objective portion of the first semester 
final examination, the content of which was not formally reviewed 
during the second semester. Group 7 was a control comparable 
in age and intelligence to the experimental groups. This group never 
studied the material in question, but took the same tests as those 
which served as measures of delayed retention. 

The experimental groups were highly comparable to one another, 
and to the total groups tested for immediate retention from which they 
were derived in immediate retention scores, age, grades received in 
psychology, and percentile position on aptitude tests. This com- 
parability was exhibited by showing that the differences were non- 
significant. The technique followed was the derivation of t-ratios of 
student as described in Fisher.’ In addition, the course in psychology 
was shown to be substantially the same for all subjects, and the two 
instructors with whom it was taken to be similar in the grades given 
and the interest created in the course. 

The items of the retention tests were classified as recognition and 
recall questions. The tests for the former will be described first. 
All items where alternative answers were available to the subject, 
consisting mainly of true-false questions, were considered as recogni- 
tion items, since the subject was required to make a judgment as to 
facts placed before him. They were scored by the formula: The total 
number of items, minus twice the number of errors, minus the omis- 
sions. There were four specific forms used in Test I, five in Test II, 
and six in Test III. For each subject there was one form of each of 
the tests. The reliability coefficients ranged from .68 to .85 with a 
median reliability of .74. The probable errors were not more than 
+.04. The number of possible points on any one form ranged from 
fifteen to forty-three; all except three of the fifteen having between 
fifteen and thirty. 

The recall items consisted of naming, completion, and listing ques- 
tions, since they required written substance reproduction. They were 
scored as to the number correct. The number of specific forms was 
the same as for recognition. The reliability coefficients ranged from 
72 to .88 with a median of .78. The probable errors were not more 
than +.03. The number of possible points ranged from twenty-five 


270 The Journal of Educational Psychology 


to sixty-nine; ten of the fifteen forms having between forty and sixty 
points. 

The raw scores for both the recognition and recall tests were con- 
verted into percentage scores since the various forms differed, as 
indicated above, in the number of items. The immediate retention 
percentage scores were derived by dividing the immediate retention 
scores by the possible maximum scores. The delayed retention 
percentage scores utilized in the present paper were derived by dividing 
the delayed retention scores by the possible maximum scores. 

The tests used for the measurement of intelligence were: The 
Dearborn Group Test Series II, Examination C Revised Edition; The 
Otis Self-Administering Test of Mental Ability, Higher Examination, 
Form D; and, The Inglis Test of English Vocabulary Form A, or B. 
The average percentile position on the three tests was calculated for 
each subject. These percentile positions served as the measures of 
intelligence used in the present study. 


RESULTS 


These measures reveal certain interesting relationships. The 
general technique for the derivation of these relationships is as follows: 
Rank-order correlations were calculated, and then changed to product- 
moment correlations by means of a table based on the appropriate 
formula. These correlations were then corrected for attenuation and 
probable errors calculated.* Twelve was arbitrarily selected as the 
smallest number of subjects in any one group for which measures of 
relationship would be derived. Since the groups were relatively 
homogeneous with respect to age, as shown previously,” the correc- 
tion for age by the partial correlation technique was not utilized. 

Before presenting these correlations, it is pertinent to mention 
certain facts that should be taken into consideration in evaluating 
them. It is evident that correlations based on as few as twelve sub- 





* The formulas used were as follows: 


1 — 82d 
»™ N(N? — 1) 


rp = 2 sin (Zr) 





Ti2 


rT = 
vi V Tu1V Tai 








T ow 
PE,_, = 7068 -75\| Paw + 5- + Au + Am 





tic 
th 
an 
ea 
th 


Gi 


fn 2.48 Re 


Zive 
whi 
Cien 








Intelligence and the Retention of Course Material 271 


jects possess a considerable possibility of error, conventionally 
expressed in terms of the probable error. To anticipate, however, a 
series of consistent results is apt to have significance that is not 
possessed by a single measure. In addition, the primary problem at 
this juncture—the effect of the length of delay interval on the rela- 
tionship of retention to intelligence—is cast in terms of large differ- 
ences. The retention intervals extend not over days or months, but 
over years, so that the paucity of the cases is, perhaps, compensated 
for by the length of the delay period elapsing between successive 
measures. In addition, the samples used have been shown to be 
similar and representative of the total population from which they were 
drawn. 

Table II, which gives the correlations between immediate retention 
and intelligence, shows that a substantial relationship does exist.* 
The recognition coefficients range from .29 to .68; the recall coefficients 
are slightly less variable, ranging from .36 to .65. Two of the recogni- 
tion and none of the recall coefficients are less than twice the size of 
their probable error. On the other hand, only four of the recognition 
and five of the recall coefficients out of a total possible of twelve in 


each instance are completely significant, z.e., more than four times 
their probable error. 


TaBLE IJ.—THE CORRELATION BETWEEN IMMEDIATE RETENTION AND 
INTELLIGENCE 





Test I Test II Test III 





Group | NW | “©°S8 | pecan | “8 | pecan | "8 | Recall 
nition nition nition 





| 
r | PEr| r | PEr| r | PEr| r | PEr| r | PEr| r | PEr 


| 





13 | .63) .15 53 .17 | .40| .18 45 .18 | .29) .20 ag) .18 
20 | .34 .17 | .41) .15 30 .17 | .53) .13 | .61) .12 | .36) .17 
13 | .51) .17 | .49) .18 | .68) .13 | .65) .14| .55) .17 | .40) .18 
26 | .30) .14 | .50) .12 | .32) .14] .57) .11 | .51) .12) .57) .11 








own = 






































The relationship between delayed retention and intelligence is 
given in Table III. Excluding Group 7, the recognition coefficients 
which are all positive, range from .25 to .82; whereas the recall coeffi- 
cients range from .16 to .86. There are three recognition and two 





* All reported correlations have been corrected for attenuation. 





272 The Journal of Educational Psychology 


recall coefficients less than twice their probable error. On the other 
hand, five of the twelve recognition and three of the twelve recall 
coefficients are more than four times their probable error. The cor- 
relations based on scores for Group 7 are all rather low, none being 
completely reliable. 


TaBLE II].—TuHEe CoRRELATION BETWEEN DELAYED RETENTION AND 








INTELLIGENCE 
Test I | Test II | Test III 
Group | N ecog- Recall Recog- Recall mecog- Recall 
nition nition nition 


r |PEr!| r | PEr| r | PEr| r | PEr| r | PEr| r | PEr 








13 | .32| .20 | .16] .21 | .47| .18| .46) .18] .25) .22 47.18 
20 | .45) .15 | .46) .14| .29) .16 | .29) .17] .43) .15 | .34) .17 
13 | .72) .12 | .50) .18 | .46) .18 | .86) .05 | .67) .14 | .59) .16 
26 | :57| .12 | .59) .11 | .82) .07 | .61) .11 | .79| .08 | .57 .15 
12 | .54) .16 | .33) .20 | .24) .21 | .42) .20| .28 .21 30) .20 





“Io ® be 






































As the results stand, they are somewhat difficult to interpret. 
Therefore, the correlations for the groups were combined in various 
ways. The correlations for these combinations were calculated by 
the method described in Fisher.’ The z values were derived from the 
calculated correlations, averaged, and then transformed back to r.* 

The combinations considered relevant for immediate and delayed 
recognition and recall treated separately in all four instances are as 
follows: (1) The combined groups irrespective of delay interval, or test, 
hereafter referred to as the Total Groups. (2) The combined groups 
for Tests I, II, and III, treated separately irrespective of delay inter- 
val, hereafter referred to as the Test Groups. (3) Groups 1 and 2 
measured for delay intervals of from two to ten months, irrespective 
of test, hereafter referred to as the Short Delay Groups. (4) Groups 
3 and 5 measured for delay intervals of from eighteen to forty-six 
months, irrespective of test, hereafter referred to as the Long Delay 
Groups. (5) Group 7; irrespective of test, hereafter referred to as the 
Control Groups. 





* Direct averaging of correlations is not permissible since they do not increase 
along a linear scale. 





al 
cl 
al 
tir 
in 


Ts 


To: 
Tes 
Tes 
Tes 
She 


Cor 


tior 
Tes 


rati 


insts 


in w. 


and 








ps 
er- 
12 
ive 
ups 
“SIX 
lay 
the 


rease 


Intelligence and the Retention of Course Material 273 


Presented in Table IV are the correlations between retention and 
intelligence for the various combined groups, both for immediate 
recognition and recall and for delayed recognition and recall in terms 
of the group designation, the number of cases, the coefficient of cor- 
relation, and the probable error. 

In the Total Groups, immediate recognition correlates .44 + .05, 
and immediate recall correlates .48 + .05, with intelligence. The 
close correspondence of these two measures and the size of the prob- 
able errors makes it plausible to consider that both immediate recogni- 


tion and immediate recall show a similar degree of relationship to 
intelligence. 


TaBLE IV.—TuHE CORRELATION BETWEEN RETENTION AND INTELLIGENCE FOR 
Various COMBINED GROUPS 























| Immediate | Immediate| Delayed Delayed 
| Recognition, Recall |Recognition| Recall 
Group | N 

| r |PEr| r |PEr| r | PEr| r | PEr 
i ieitcetcwtaaces® 216 | .44/] .05 |) .48| .05| .62| .03 | .52| .05 
ns sneaker a eke 2A 72 | .41/] .08 | .48| .08| .63 | .06| .48] .07 
nse kéad cksue cue 72 | .40} .08| .47| .08| .60| .07| .57| .07 
ann 96.5500 vel oe 72 | .51 | .07 | .47| .08| .62]| .07 | .50]| .07 
OCS GORY... .. cc sscce 99 | .44| .07 | .46|] .07| .38| .07 | .37 | .07 
SN oo wee 000 9-6-4 117 | .45 | .06| .49| .06| .76| .04| .62 | .05 
ed bah eneenwens Mee eeu Bane Bose T oes 0 cee den aE «ae 




















The correlations for Tests I, II, and III by the method of recogni- 
tion are .41 + .08, .40 + .08, and .51 + .07. The correlation for 
Test III is somewhat higher than the others. However, the critical 
ratio* for the difference between the correlations for Tests II and III 





* The formula used for calculation of the critical ratio in this and similar future 
instances where the same individuals made up both groups is 


» 5 
PEaite. 
in which 





PEait. = V/ PE%,,, + PE*%,,, — 2rr,7,,PEr,,PE 


12713 13 


and in which 


=o; — rie ia(1 — 7219 — r213 — 293 + 2riorisros 


Tr 





12713 


2(1 — r*i2)(1 — ris) 





274 The Journal of Educational Psychology 


is but 1.35, indicating an unreliable difference approximately thirty 
per cent of what it should be in order to insure a significant difference 
greater than zero.* Corresponding coefficients for recall are .48 + .08, 
.47 + .08, and .47 + .08, which show no differences in the degree to 
which immediate recall is related to intelligence according to the 
test concerned. 

The Short Delay and Long Delay Groups give correlations of 
.44 + .07 and .45 + .06 between immediate recognition and intelli- 
gence. The recall correlations for the same groups are, respectively, 
.46 + .07, and .49 + .06. It is evident that groups later measured 
after shorter intervals for delayed retention, and groups later measured 
after longer intervals for delayed retention, show no significant differ- 
ences in correlation between immediate retention and intelligence. 
If differences are found between these groups when delayed retention 
is considered, they cannot be attributed to differences in the degree 
to which the immediate retention scores of the groups were related 
to scores on the intelligence tests. 

Attention will now be turned to the correlations between delayed 
retention and intelligence. There is no tendency in the Total Groups 
for either the delayed recognition or recall correlation with intelligence 
to be higher than the other. Irrespective of the length of the delay 
interval or the test, the correlations with intelligence are .62 + .03 
and .52 + .05 for the two methods of measurement respectively. 

Scores on Tests I, II, and III for delayed recognition correlate 
.63 + .06, .60 + .07, and .62 + .07, respectively, with intelligence. 
Recall correlations with intelligence for similar groupings are .48 + .08, 
.57 + .07, and .50 + .07. Taking in consideration the size of the 
probable errors there is no difference of any moment in the correlations 
between either delayed recognition and intelligence or delayed recall 
and intelligence among the three tests. 

On the other hand, there is a very definite increase in the relation- 
ship of retention to intelligence as the delay interval increases when 
one considers the delayed retention coefficients. The Short Delay 
Group (two to ten months delay) has a correlation between recognition 
and intelligence of .38 + .07. Recall for the same delay period gives 
a coefficient of .37 + .07. The Long Delay Group (eighteen to forty- 
six months delay) gives correlations of .76 + .04 and .62 + .05 for 
recognition and recall measures, respectively. The critical ratio for 





* A critical ratio of 4.00 or more is taken to mean a completely significant 
difference. 





—= = MD we = aw ek 


~ 


ve 


pr 
va 


th 
Nc 
the 
of 
cul 
the 
att 
are 


sam 


was 








\w -_ at 


— 


ns 


all 


on- 
nen 
lay 
ion 
ives 
rty- 
for 
. for 


icant 





Intelligence and the Retention of Course Material 275 


the difference between recognition coefficients is 4.69; for the difference 
between recall coefficients it is 2.90.* In the first instance there are 
one hundred chances, and in the second ninety-seven chances in one 
hundred of a significant difference between the correlations for the 
shorter and longer delay intervals. It would appear that the reten- 
tion scores for the groups measured for shorter delay periods show a 
lesser degree of relationship to intelligence than those for the groups 
measured for longer delay periods. 

The correlations between recognition and intelligence for the 
Control Group have a combined coefficient of .36 + .11. For the 
recall measures there is a correlation of .35 + .11. These are smaller 
than those for the experimental groups which, as pointed out before, 
are .62 + .03, and .52 + .05 for recognition and recall. The critical 
ratio for recognition measures is 2.28; for recall measures, 1.40. There 
are ninety-four and eighty-three chances in one hundred of a true 
difference greater than zero. Evidently there is a possibility of greater 
relationship between intelligence and retention in the experimental 
groups than in the control groups although the critical ratios are by 
no means large enough to insure complete significance. 


DISCUSSION 


Throughout this discussion it is well to bear in mind that by the 
very nature of the situation, there was no control over the learning 
process. This would presumably operate to decrease the size of the 
various correlations. 

Certain general tendencies in regard to the relationship of retention 
and intelligence may be observed from these data. It is obvious that 
the relationship is positive in sign and substantial in magnitude. 
None of the correlations are negative and very few less than .30 with 
the majority about .40 or .50. This is in fair agreement with the work 
of previous investigators, although close comparison is rendered diffi- 
cult because they did not correct for attenuation nor measure precisely 
the same relationships. In so far as any rough comparison can be 
attempted the results of Jones,!® Lee,” Bassett,? Deitze,’? and White,” 
are such that if corrected for attenuation they possibly would be 





*Since in these cases the correlations were calculated from independent 
samples, the conventional formula for the probable error of the difference 


PE(r: 7 To) = / PE;? + PE,? 


was used in obtaining the critical ratios in these and similar future instances. 








‘ 


276 The Journal of Educational Psychology 


similar to those obtained by the present writer. On the other hand, 
Anderson and Jordan! and Grant! obtained coefficients somewhat 
greater than those reported here. 

Neither recognition nor recall are more closely related to intelli- 
gence either in immediate or delayed retention. The summary coeffi- 
cients for the Total Group as previously pointed out are .44 and .48 
for immediate recognition and recall, and .62 and .52 for delayed 
recognition and recall. It will be observed in one case that the cor- 
relation for recognition is the larger of the two, in the other instance 
that for recall. Inspection of Table IV for other comparisons of 
recognition and recall measures, in the Test Groups, Short Delay, 
Long Delay, and Control Groups, show that the differences are obvi- 
ously non-significant. In each case the coefficients show that recogni- 
tion ability is as closely related to intelligence as recall ability. 

The only previous study bearing on this issue is that of Lee,?? who 
interpreted her results as showing that recall is more closely related 
to intelligence than is recognition. However, the data are by no 
means unequivocal since critical ratios were not calculated. It would 
appear from examination by the writer that only about two of the 
possible twelve comparisons would yield a critical ratio of 4.00 or 
more. These, and previously adduced considerations, make it possi- 
ble to interpret her results as showing that recognition and recall 
ability are related to intelligence to approximately the same degree. 

It has been found that the relationship between intelligence and 
retention seems first to decrease as the delay interval increases and 
then to increase again, both in recognition and recall. This can be 
observed by comparing the correlation between immediate retention 
and intelligence, correlations for shorter delay intervals, and correla- 
tions for longer delay intervals. The correlations between intelligence 
and immediate recognition, delayed recognition after from two to 
ten months, and delayed recognition after from nineteen to 
forty-six months are, .44 + .05, .38 + .07, and .76 + .04, respectively. 
Recall coefficients are, respectively, .48 + .05, .37 + .07, and .62 + .05. 
In both instances, the coefficients for the shorter delay intervals are 
smaller than those for immediate retention. In both cases the coef- 
ficients for the shorter delay intervals are considerably smaller than 
those for the longer intervals. The coefficients for the longer inter- 
vals are, in both instances, appreciably greater than those for immedi- 
ate retention. 





a> gee alCUMRlCUh CC 


res 
tes 
in¢ 
eit 
Th 
qu 


rec 








are 
yef- 
han 
ter- 
edi- 





A A AT A TS NE ARIE 


Intelligence and the Retention of Course Material 277 


There is some support in the literature for the first directional 
trend exhibited; namely, the lesser degree of relationship between 
intelligence and delayed retention as compared with immediate 
retention. The studies of Jones,'® Lee, Anderson and Jordan,! and 
Grant'! in which no correlations for delays longer than eight weeks 
were reported, can safely be disregarded as far as this issue is concerned, 
since it is hardly likely that such a short period of delay would estab- 
lish any trend. The correlations found by Bassett? and Dietze’ both 
demonstrated this trend clearly. In the case of those of White,” 
one test showed the trend and the other indicated no difference 
between the correlation of immediate retention and intelligence, and 
the correlation of delayed retention and intelligence. 

The increase with the longer delay intervals has not been previ- 
ously found, but no prior investigation over as long a period seems to 
have been made. 

There seems no doubt that immediate recognition and immediate 
recall are both related to intelligence, and that this relation decreases 
with shorter delay intervals. This might plausibly be regarded as due 
to the greater confusion introduced by the interpolated activities. 
The smaller correlation between intelligence and retention after shorter 
delay intervals may have been due to a blocking set up by the mass of 
partially retained material which was greater at this point for the 
more intelligent and consequently more confusing. After longer 
delays the competing response tendencies became less in number and, 
although the more intelligent still had a greater number of such 
response tendencies, the interference decreased more for the more 
intelligent, so a closer relationship between intelligence and retention 
was obtained. Partial corroboration for the decrease in competing 
response tendencies was found in a trend for the number of retention 
test questions answered incorrectly to decrease, and the omissions to 
increase as the delay interval increased. Subjects after long delays 
either recalled or recognized correctly or gave no response at all. 
This explanation, however, is hypothetical, and may be found inade- 
quate without in any way modifying the empirical results obtained. 


CONCLUSIONS 


1. A similar and substantial degree of relationship to intelligence 


was exhibited by (1) immediate recognition and recall, and (2) delayed 
recognition and recall. 


et 








278 


The Journal of Educational Psychology 


2. There was a tendency for the relationship of recognition and 


intelligence, and recall and intelligence first to decrease and then to 
increase as the delay interval increased. 


10. 


11. 
12. 
13. 
14. 
15. 


16. 


17. 
18.Kitso n, H. D.: ‘The scientific study of the college student.’’ Psychol. 


19. Layton, E. T.: ‘‘Persistence of learning in elementary algebra.” J. Educ. 


BIBLIOGRAPHY 


. Anderson, J. P., and Jordan, A. M.: “Learning and retention of Latin words 


and phrases.” J. Educ. Psychol., Vol. xrx, 1928, pp. 485-496. 

Bassett, S. J.: Retention of history in the sixth, seventh, and eighth grades with 
special reference to the factors that influence retention. Johns Hopkins Univ. 
Stud. in Educ., No. 12, 1928, pp. 110. 


. Bell, J. C.: ‘‘ Mental tests and college freshman.” J. Educ. Psychol., Vol. vu, 


1916, pp. 381-399. 


. Bolton, E. B.: ‘‘Relation of memory to intelligence.” J. Exper. Psychol., 


Vol. xtv, 1931, pp. 37-67. 


. Carey, N.: ‘‘Factors in the mental processes of school children, II. On the 


nature of the specific mental factors.” Brit. J. Psychol., Vol. vu, 1915, 
pp. 70-92. 


. Carothers, E. F.: ‘‘ Psychological examination of college students.’’ Arch. 


Psychol., No. 46, 1922, pp. 82. 


. Dietze, A. G.: ‘The relation of several factors to factual memory.” J. Appl. 


Psychol., Vol. xv, 1931, pp. 563-574. 


. Fisher, R. A.: Statistical methods for research workers. London: Oliver, 1936, 


pp. x1 + 339. 


. Foster, F. C.: ‘‘Verbal memory in preschool children.”” J. Genet. Psychol., 


Vol. xxxv, 1928, pp. 26-44. 

Garrett, H. E.: ‘‘The relation of tests of memory and learning to each other 
and to general intelligence in a highly selected adult group.” J. Educ. 
Psychol., Vol. xrx, 1928, pp. 601-613. 

Grant, M. E.: *‘Some theories and experiments in the field of memory.” 
J. Educ. Psychol., Vol. xx111, 1932, pp. 511-527. 

Guillet, C.: ‘A study of the memory of young women.” J. Educ. Psychol., 
1917, Vol. vii, pp. 65-84. 

Hegge, T. G.: An experiment with the logical memory of subnormals. Training 
School Bull. (Vineland), Vol. xxv1, 1929, pp. 82-86. 

Hurlock, E. B., and Newmark, E. D.: ‘‘The memory span of preschool chil- 
dren.” J. Genet. Psychol., Vol. xxxrx, 1931, pp. 157-173. 

Jones, H. B.: ‘‘Experimenial studies of college teaching, the effect of exami- 
nation on permanence of learning.”’ Arch. Psychol., No. 68, 1923, pp. 71. 
Kennedy, L. R.: ‘The retention of certain syntactical principles by first and 
second year Latin students after various time intervals.” J. Educ. Psychol., 

Vol. xx1m1, 1932, pp. 132-146. 

King, I., and Homan, T. B.: ‘“‘Logical memory and school grades.” J. Educ. 

Psychol., Vol. rx, 1918, pp. 262-269. 


Monog., Vol. xx111, No. 1, 1917, p. 81. 


Psychol., Vol. xx111, 1932, pp. 46-55. 





~_ a 


nA 


27 


26 


29 








luc. 


hol. 


duc. 








20. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


Intelligence and the Retention of Course Material 279 


Lee, A. L.: “‘An experimental study of retention and its relation to intelli- 
gence.” Psychol. Monog., Vol. xxxtv, No. 157, 1925, pp. 45. 

Lyon, D. O.: ‘‘The relation of quickness of learning to retentiveness.” Arch. 
Psychol., No. 34, 1916, pp. 60. 

Maiti, H. P.: ‘Memory and intelligence.”” Indian J. Psychol., Vol. v1, 1931 
pp. 169-181. 

Mc Elwee, E. W.: ‘‘Is a test of visual memory affected by maturity?” J. 
Appl. Psychol., Vol. xtx, 1935, pp. 463-466. 

Powers, 8S. R.: A diagnostic study of the subject-matter of high-school chemistry. 
Teachers College Cont. Educ., No. 149, 1924, pp. 84. 

Sharp, S. E.: ‘“‘Individual psychology. A study in psychological methods.” 
Amer. J. Psychol., Vol. x, 1899, pp. 329-391. 

Travis, A.: ‘Reproduction of short prose passages: A study of two Binet 
tests.’’ Psychol. Clinic, Vol. rx, 1915, pp. 189-209. 


Watson, R. I.: ‘‘An experimental study of the permanence of course material 
in introductory psychology.”” Arch. Psychol., No. 225, 1938, pp. 64. 

White, A. L.: The retention of elementary algebra through quadratics after varying 
intervals of time. Washington: Judd and Detweiler, 1932, pp. 67. 


Winch, W. H.: “Immediate memory in school children.” Brit, J. Psychol., 
Vol. 1, 1901, pp. 127-134. 





A PRELIMINARY INVESTIGATION INTO THE 
PROBLEM OF MEASURING ENGINEERING APTITUDE 


8. R. LAYCOCK! AND N. B. HUTCHEON? 


University of Saskatchewan 


1. THE STATEMENT OF THE PROBLEM 


The authors believe that more accurate measures of engineering 
aptitude than are, at present, available would be of great value to 
vocational counsellors in high schools and to administrators of engi- 
neering colleges. Ideally, through wise vocational guidance based on 
suitable tests, it should be possible to eliminate the waste of human 
effort involved and the ill-effects of failure on those individuals who, 
after spending one or more years at college, find themselves unfitted 
for engineering courses. In the University of Saskatchewan, only 
sixty-five per cent of those entering the College of Engineering as 
freshmen reach the second year, forty-five per cent reach the third 
year and thirty-five per cent the fourth year. It should be pointed 
out, however, that some students go to other universities for their 
final years in mining and electrical engineering. Obviously other 
factors than lack of aptitude for engineering enter into the above 
figures, but the fact that the elimination at the end of the first 
year reaches thirty-five per cent is indicative of the heavy mortality 
of students in this College. Elimination of this magnitude con- 
stitutes a considerable wastage of public funds in the somewhat 
trial-and-error process of selection. It involves a waste of the 
economic resources of students and their parents. From the mental 
hygiene point of view the effect of failure in university work upon 
the personalities of the students is not lightly to be dismissed. 

In some universities restriction of enrollment is practised. Some- 
times this is a matter of necessity due to limited accommodation and 
equipment. At other times it is a matter of policy, the university 
making an effort to raise standards so that it may have a highly 
selected group for training. In any case it is very important that the 
methods of selecting students for entrance to engineering courses be 
such as to ensure that those most likely to be successful be chosen. 

At present, selection of students for admission to university 
engineering courses in Canada seems to be based on senior matricula- 





1 College of Education. 
2 College of Engineering. 


280 





co 


du 
ani 


lite 
on 
son 
tes 
anc 
inc] 


Abil 








Se, ee 


1e- 


nd 
ity 
bly 
the 

be 


sity 
ula- 





Measuring Engineering Aptitude 281 


tion or Grade XII standing. That this is the best method can be 
seriously questioned. Even if an intelligence test were administered 
to candidates for admission this additional data might not prove to be 
adequate. A larger proportion of cases than can easily be ignored 
may be found where good standing in the intelligence examination 
and at senior matriculation is accompanied by little success in engi- 
neering courses. Likewise, success in such courses is sometimes 
attained in cases of relatively low matriculation standing and of rela- 
tively low intelligence score. 

It would seem probable that prediction of success in engineering 
must be based on more refined measures than the results of a senior 
matriculation examination or even such results taken in combination 
with the scores on a standard test of intelligence. To explore the 


possibility of obtaining such measures is the purpose of the present 
investigation. 


2. THE METHOD OF THE INVESTIGATION 


(A) Selection of Cases.—For purposes of study the authors decided 
to take a freshman class in engineering at the University of Saskatche- 
wan and follow them through their entire engineering course and, as 
far as possible, into their professional life after graduation. The 
group studied included all those who enrolled in first-year engineering 
for the first time in the Fall of 1937 and who wrote the final examina- 
tions at the end of the first year. Students repeating the first year 
and those who dropped out during the year because of sickness or 
other causes were not included. The present study is, therefore, 
confined to one hundred forty-four students. 

The present paper is a preliminary report on the data gathered 
during the first year of the engineering course of the group to be studied 
and is to be considered merely as an early stage of the investigation. 

(B) Tests Used and Data Available-—After an examination of the 
literature on vocational selection and after reviewing the work of Cox! 
on mechanical aptitude and manual skill as well as the work of Pater- 
son, Elliott, Anderson, Toops and, Heidbreder? on mechanical ability 
tests, the authors decided to rule out tests of manual skill and dexterity 
and to include tests of mechanical aptitude which did not seem to 
include these factors. While it is not known how far mechanical 





1 Cox: Mechanical Aptitude. Methuen & Co., London, 1928. 


? Paterson, Elliot, Anderson, Toops, and Heidbreder: Minnesota Mechanical 
Ability Tests. University of Minnesota Press, Minneapolis, 1930. 





282 The Journal of Educational Psychology 


aptitude enters into success in university courses in engineering, tests 
purporting to measure this ability were included with a view to explor- 
ing their possibilities. 

It should be noticed that while three tests of mechanical aptitude 
were used, they were chosen with respect to their possible selective 
value during the later years of the engineering course. The present 
study refers to the results of first-year engineering and there is little 
ground for assuming that mechanical aptitude tests are prognostic 
of success during the first year, since only one subject (descriptive 
geometry) would seem to have any relation to such abilities as might 
be measured by tests of mechanical aptitude. 

Data were obtained from the following: 

(1) The Form Relations Test of the National Institute of Indus- 
trial Psychology of Great Britain. This is a pencil and paper test 
of the paper form-board variety. At the top of each page are drawings 
of squares or cubes with certain portions cut out and the student has 
to select from a number of drawings below, the one which would fit 
into the square or cube above. 

(2) Cox Mechanical Aptitude Test M2 (Models). This test 
consists of a series of wooden models in the form of boards eight by 
eleven inches in size which present to the student in a front view 
certain slots, buttons, etc. The examiner works the model from the 
rear and the student has to indicate on prepared forms the elements 
which would be required to make the model work as it does. 

(3) Cox Mechanical Aptitude Test D (Diagrams). This test 
consists of a series of diagrammatic representations of certain com- 
binations of simple machine elements drawn on linen charts two by 
three feet in size. While the elements are combined in such a way 
as to be interconnected, they do not represent any actual machine. 
The charts are shown to the students, who are then asked to answer 
on prepared forms certain questions relating to the effect of the move- 
ment of one part upon the other parts. 

(4) In addition to the tests of mechanical aptitude the authors 
used the results of the American Council Psychological Examination, 
1937 Edition, which was administered at the first of the term to all 
freshmen in the Colleges of Engineering and of Arts and Science in the 
University. 

(5) It was felt that an attempt should be made to explore the 
possibilities of a test of personality characteristics. For this purpose 
the Bernreuter Personality Inventory was administered and the 





the 
aut 
rea 
der 
the 
sec 


the 
Stal 
me! 
ity : 
of a 
trig 








rs 
n, 
all 
he 


ose 
the 





Measuring Engineering Aptitude 283 


results scored on the six scales—Neurotic Tendency, Self-Sufficiency, 
Introversion-Extroversion, Dominance-Submission, Self-Confidence, 
and Sociability. 

(6) As a measure of the interests of the students the Thurstone 
Interest Inventory was used. This is scored on seven scales. The 
Physical Science Interest factor is the one which Thurstone reports 
as having positive projections for the vocation of engineer, architect, 
physicist, chemist, astronomer, and mathematician, and is the one 
which might conceivably be of most use in the selection of engineering 
students. The Commercial Interest factor has positive loadings for 
such vocations as advertiser, auto salesman, banker, manufacturer, 
office manager, factory manager, retail merchant, and stockbroker. 
The Legal Interest factor has significant positive loadings for such 
professions as lawyer, clergyman, congressman, and public speaker. 
The Physical Activity interest or Athletic factor has positive projec- 
tions for such occupations as athletic director, explorer, forest ranger, 
professional athlete, cattle rancher, and newspaper reporter. The 
Academic factor has as its principal characteristic an interest in books 
and has positive loadings for the occupations of philosopher, historian, 
librarian, college professor, high-school teacher, psychologist, and 
economist. The Descriptive factor is characterized by a general 
interest in people and things. It has positive loadings for such profes- 
sions as actor, advertiser, art critic, artist, journalist, musician, 
reporter, explorer, poet, and radio announcer. The Biological factor 
has positive projections of the following occupations; Biologist, bota- 
nist, chemist, pharmacist, physician, dentist, geologist, and surgeon. 

All of the above tests were administered by one of the authors except 
the American Council Psychological Examination, where one of the 
authors assisted in the administration of the test. For practical 
reasons it was not possible to give all the tests at the first of the aca- 
demic year. The Form Relations test was given in the first term and 
the Interest and Personality Inventories, Models and Diagrams in the 
second term. 

(7) In addition to the six tests described above there were available 
the Grade XII marks of the students. These marks are the result of 
standing awarded at an examination held by the Provincial Depart- 
ment of Education where a serious attempt is made to secure uniform- 
ityin marking. The average mark in Grade XII is based on the results 
of at least seven subjects including English, history, French, geometry, 
trigonometry, physics, and chemistry. 










































































284 The Journal of Educational Psychology 
TaBLe I.—ZERO OrDER CORRELATIONS 
= 2 : 
Ef|52| ¢ ,|3 
SEl&s| & | 2 F S 
2a>-ie"7/ Ela /A] ea | | 
Average first-year mark P| 
Average Grade XII mark +.61 ig | 
Intelligence + .34 +.37] \_ a 
Models + .16 + .26 
Diagrams +.14 + .22) +.45 
Form relations +.25| +.09| +.36) +.40) +.24 
Neurotic tendency + .06 a 
Self sufficiency .00 
Introversion-extroversion + .04 : 
Dominance-submission — .05 ; 
Lack of self-confidence +.07 7 
Lack of sociability + .02 : 
Physical science interest + .26; +.15) +.09 + .29 
Commercial interest + .06 
Descriptive interest + .06 
Academic interest + .20 —_ 
Biological interest +.10 
Legal interest + .08 
Physical activity interest + .04 





























ag 


st 
le 


bl 
F< 
Wl 
lat 








Measuring Engineering Aptitude 285 


(8) The results of the students in their final examinations in first- 
year engineering were made available through the courtesy of the Dean 
of the College of Engineering. The average mark was obtained from 
the results of the five main subjects of the course—mathematics 
(calculus), physics, chemistry, descriptive geometry, and English. 
Drawing was not included in the results. 

At all points of the investigation the authors received the encour- 
agement of Dean C. J. McKenzie, College of Engineering, University of 
Saskatchewan, and every possible facility was extended by him and his 


staff to make the investigation possible. The students also gave excel- 
lent codperation. 


3. RESULTS 


The results of zero order correlations are presented in Table I. 

Comments on Table I.—First-year engineering marks had apprecia- 
ble correlations with the following: Grade XII marks, intelligence test, 
Form Relations test, and Physical Science interest. The correlation 
with Grade XII marks is within reasonable expectation, but the corre- 
lation between the first-year marks and the results of the American 
Council Psychological Examination were surprisingly low, being only 
+.34. Because of this the results of the freshmen engineers were 
compared with a control group of one hundred ninety-seven Arts and 
Science freshmen who had taken the same psychological examination. 
The mean score of the one hundred forty-four freshmen engineers was 
208 on the psychological examination and of the Arts and Science 
freshmen 207.7; the mean score for the males being 206.7 and for the 
females 209.5. It is interesting, in passing, to note that in the sub- 
tests of the psychological examination the freshmen engineers did better 
than the Arts and Science male freshmen on arithmetic and analogies 
(non-verbal symbols) but poorer on artificial language, completion, 
and opposites. When the correlation of the Arts and €cience fresh- 
men with first-year Arts and Science marks was computed it was found 
to be over +.50, and this has been consistently the case for some years. 
The authors are not prepared, at present, to suggest reasons for the 
difference in the correlation of first-year engineering marks with the 
American Council Psychological Examination and the correlation of 
first-year Arts and Science marks with the same intelligence test. 

With intelligence held constant,! first-year engineering marks were 
found to correlate with Grade XII marks +.56. If this were confirmed 





1 Partial correlations were found for use in obtaining regression equations but 
have not been given in detail in this paper. 





286 The Journal of Educational Psychology 


by later investigations it would raise important questions as to the 
nature of the underlying factors giving rise to this coefficient of 
correlation. 

It will be noticed in Table I that first-year engineering marks 
correlate with physical science interest almost as much as they do with 
the intelligence test. Further, with intelligence held constant, first- 
year marks were found to correlate with physical science interest 
+.25, which is approximately the same as when intelligence is not held 
constant (+.26). It would seem that, for the group tested, the cor- 
relation of first-year marks with physical science interest is relatively 
independent of intelligence as measured by the American Council 
Psychological Examination. 

It is interesting to note that the intelligence test correlated only 
+ .37 with the Grade XII marks of the freshmen engineers. This is 
lower than one might expect to find. An analysis of correlations of 
each subject of Grade XII with the intelligence test yielded nothing of 
significance, the highest correlation being for science (+.33), then 
English (+.30), mathematics (+.27), French (+.26), history (+.23). 

The intelligence test had appreciable correlations with each of the 
tests of mechanical aptitude. 

An inspection of Table I also indicates that the three tests of 
mechanical aptitude have intercorrelations with each other ranging 
from +.24 to +.45. 

The Bernreuter Personality Inventory yielded nothing of signif- 
icance in correlations with first-year marks. Its results will be exam- 
ined at a later time in connection with a study of those of the group 
who are eliminated at the end of their first year and of those who will 
be eliminated subsequently. 

The Thurstone Interest Inventory yielded only two appreciable 
correlations. The data yielded by the scores on the seven scales will 
be studied later in the same way as is suggested above for the Personal- 
ity Inventory. 

The physical science interest correlated with form relations +.29, 
which is slightly more than it correlates with intelligence. When 
intelligence is held constant physical science interest correlates with 
form relations to the extent of +.27. This result merits further study 


and investigation. 


4. PREDICTIVE VALUE OF RESULTS 


In order to determine the degree to which the data made available 
by the tests and Grade XII marks is of predictive value for vocational 





(1) 








ble 
vill 
al- 


hen 
vith 
udy 


able 
onal 





Measuring Engineering Aptitude 287 


counselling, partial correlations of the first, second, and third order 
were worked out. The predictive value of the regression equations 
up to five variables (first-year marks with intelligence, Grade XII 
marks, form relations and physical science interest) have been deter- 
mined by calculating the multiple correlations. These are presented 
in Table II. The multiple coefficient of correlation! gives the relation- 
ship between the scores actually obtained on a test and the scores on 
the same test as estimated from the regression equation made up of the 


TaBLeE II 





| 
7 ; | (2) Intelli-| (3) Grade (4) Physical | (5) Form 
(1) First-year marks gence test | XII marks ‘science interest relations 








| —_ 


| 








M, = 60 M., = 208 | M; = 74 M, = 0.39 Ms = 35 
o, = 12.7 o2 = 45.5 of = 8.84 | o, = 0.375 os = 7.84 
rio = 0.34, ris = 0.61 | rig = 0.26 ris = 0.25 








| | 
| | (1), (2), and | (1), (2), (3), (1), (2), (3), 
| (1) and a (3) | and (4) (4), and (5) 








| 
Multiple correlation...) ris 0.61 Ris) = 0.62, Riss = 0.65| Ricesas) = 0.66 
| 








Standard error of esti- | 

mate o(est. X1)..... | 10.27 | 9.94 9.68 9.57 
Probable error of esti- | | 

mate (0.67450)..... | 6.92 | 6.80 6.52 6.45 





Z, = predicted deviation from mean 
z = deviation from mean 

X = score 
Regression equations 
(1) from (3) only 


£ 8823 
(1) from (2) and (3) zy 


0425 — 0.827; 


0 
0 





tests of the battery or team. In other words, the marks actually 
obtained in first-year engineering would be expected to correlate +.61 
with first-year engineering marks calculated from the regression equa- 
tion obtained from first-year marks and Grade XII marks only. 
Multiple correlations involving three, four, and five variables are shown 
in Table II together with corresponding standard errors and probable 
errors of estimate. It will be noticed that with five variables the 
multiple correlation has only been increased to +.66 as compared 
with +.61 for the prediction of first-year marks from Grade XII only. 





1See Garrett, H. E.: Statistics in Psychology and Education, Second Edition. 
Longmans, Green & Co., 1937, p. 411. 


288 The Journal of Educational Psychology 


These results indicate that Grade XII marks alone are not only the 
best single predictive measure, but are nearly as good as prediction 
from the battery of four tests. It should be pointed out that, for the 
purposes of this study, the average of five subjects of the first year was 
used. Two of these; namely, English and descriptive geometry, might 
be expected to differ widely in the abilities required for success. 
Further study comparing the results of the tests with individual 
classes might present much different results. In addition, the present 
study deals with first-year results only. As the students proceed to 
courses which are more strictly professional a different picture of the 
predictive value of the tests may be presented. 

In addition to following this group of students for several years, 
the authors feel that much valuable data may be gleaned from an 
intensive study of the present data. Twenty-six of the one hundred 
forty-four students studied have been required by the faculty to dis- 
continue or repeat the first year. The results of these students on the 
entire battery of tests should be compared with the results of the entire 
group and also with those of an equal number of students who were 
most successful in their first-year results. A tentative investigation of 
individual cases indicates that certain trends of value might be dis- 
covered which are not shown by the general statistical treatment. 





fo 


wl 


tha 
ma 
As 

bec 


pop 
in t 
obt:; 


Poly 
mani 


the ] 
Sept 











A SIMPLE GRAPHICAL METHOD FOR DETERMINING 
THE SIGNIFICANCE OF A DIFFERENCE* 


TEOBALDO CASANOVA 
University of Puerto Rico 


In test item validation the significance of the difference between 


two proportions is usually tested through the critical ratio, whose 
formula is, 























Py _ iF 
_ P, — P» _ Ni, WNeo (1) 
P,Q: Pols 7 F, PF, 
N; Ne ri(1 - 2) + (1 - £2) 
N;? N2? 


where P; = proportion succeeding in the upper group. 

P, = proportion succeeding in the lower group. 

@, = proportion failing in the upper group. 

(). = proportion failing in the lower group. 
number succeeding in the upper group. 

F, = number succeeding in the lower group. 

N, = number of cases in the upper group. 

Ne = number of cases in the lower group. 

The general rule is to retain items having a critical ratio greater 

than two or three, and to reject those of lower values as failing to 
make a reliable discrimination between the upper and lower groups. 


As these are usually equal in number, N; = N2 = N, and formula (1) 
becomes 


Vr (Fi _ F 2) - VEN — F;) + F.(N — F2) (2) 





Assuming that the upper and lower groups are samples of the same 
population, Zubin! substituted for P; and P. in the standard errors 
in the denominator of (1), the weighted average of P; and P», thus 
obtaining an expression in terms of Chi-Square. The critical value 


* The writer acknowledges his indebtedness to Edward E. Cureton, of Alabama 
Polytechnic Institute, for various suggestions toward the improvement of the 
manuscript. 

‘Zubin, J.: ‘Note on a Graphical Method of Determining the Significance of 


the Difference Between Two Groups Frequencies.” J. Ed. Psychol., Vol. xxvut, 
September, 1936, pp. 431-444. 


289 


290 The Journal of Educational Psychology 


required for significance is then found from Fisher’s tables giving the 
value of P, instead of from the normal probability tables in terms of 
deviations from the mean, as in the case of the critical ratio. Zubin’s 
equation represents an ellipse in variables Ff; and F2, while N and 
Chi-Square remain constant. This means that any change in the last 
two named requires the construction of a new ellipse. 

A simpler method was used by Votaw,! who plotted equation (1) 
after squaring and solving for P; in terms of Pz. This also requires 
separate drawings for different values of N and CR. But Votaw only 
considered one root of the resulting quadratic, and his method requires 
the finding of many P, values, especially if N is large. When the two 
roots are considered, the graph is an ellipse whose construction is 
more accurate and requires less time. 

Squaring (2), 

N 
CR? 


Letting N/CR? = H, and collecting, 
(H + 1)F\2 — 2HF\F.+ (H + 1)F.? — Fi\N — F.N = 0 (3) 


This is the equation of an ellipse in terms of variables F; and F», 
and constants H and N. The coordinates of its center, a and 6 are: 


(F,? — 2FiF2 + F2?) = F,\N — F\?*+F.N — F;? 








N N 
—" Ww-m@4i § 3 
ae ee 
- H?—(H+1)? — 2 


The axes of the ellipse are inclined at an angle @ to the codrdinate 
axes, 0 being given by the equation 


2... ee 
6 = 5 tan 9 = * (5) 
Translating the codrdinate axes to the center of the ellipse and 
rotating them through an angle of 45°, the transformed equation 


of the ellipse is, 





1 Votaw, D. F.: ‘Graphical Determination of Probable Error in Validation of 
Test Items.” J. Ed. Psychol., Vol. xxtv, December, 1933, pp. 682-686. 





ear 
m 


eq 
va. 


pal 


ser 


pal 
pri 








oer = & /™m ‘*” 


(4) 


1ate 


(5) 


and 
ition 


ion of 





Determining the Significance of a Difference 291 


N? 
X° + (2H + 1)¥? — > = 0 
or, 
XxX? Y? 
net ye =! (6) 
2 4H+2 
This is the equation of the ellipse in standard form. The distance 
N 
from each one of the foci to the center is ot aa ¢ With these 
J 2+ H 


specifications the mathematical student may readily draw the ellipse 


and eliminate the items whose frequencies represent points within the 
ellipse. 


om #2 *) 
Let e ( 2 2 
sities (ass) 
Then 
e+ c=] (7) 


This is the equation of a unit circle. Now let (F; — F:) =d. If 
both sides of (7) are multiplied by d? while the new variables & and ¢ 


VN 


are plotted in CR units, the result will be a circle with radius 


equal to oy. when measured in regular units. There will be 
a series of concentric circles, such as shown in the accompanying graph, 
each one corresponding to the pairs of frequencies satisfying the 
modified form of equation (7). The same result is obtained from 
equation (2) if [F,(N — F,)]* and [F.(N — F;)]” are taken as the 
variables, and referred to the new origin, but with codrdinate axes 
parallel to those of the original equation (3). 

The practical work may be easily accomplished by drawing a 
series of concentric circles in graph paper, as illustrated in the accom- 
panying graph. The circle whose diameter is N is to be known as the 
principal circle. Take MB = F, and KA = F,. Then 


BD? = F\(N — F,), 


on 


The Journal of Educational Psychology 


GRAPH FOR THE SIGNFICANCE OF A DFFERENCE 


























=— ope. 
+p t 





























Determining the Significance of a Difference 293 


and AC? = F.(N — F2). P, is a point whose coérdinates are BD and 
AC. Therefore. 


OP, = VBD? + AC? = VF\(N — F:) + FAN — F2) 








The last expression is the right-hand member of equation (2). After 
turning OP, to the horizontal position OP2, its length may be read in 
the scale in the horizontal line below the circles, in which the dis- 
/N 
CR 
circles. If d the obtained difference is equal to, or greater than, the 
number thus read, that difference is significant and the item is retained; 
otherwise it is eliminated from the test. In the example given 
N = 100, F, = MB = 80,F. = KA = 60, andCR = 2.5. Therefore, 
the scale in the horizontal line below the circles is four times the scale 
of the circles. In this scale the length of the line OP: is seen to be 
about 1534, (the calculated value is 15.8). But the obtained dif- 
ference = F, — F, = 20, and consequently, it is significant. 

It is obvious from the above illustration, that if F; and F, > N/2, 
P is in the first quadrant; if PF; and F, < N/2, P is in the third quad- 
rant; if F; > N/2 and F, < N/2, P is in the second quadrant; and if 
F, < N/2 while F, > N/2, P is in the fourth quadrant. But the 
variables may be interchanged as in the other example illustrated in the 
graph, where F,; = KF = 24 and F, = ME = 20. OP; when turned 
to the position OP, reads about 1414 in the d scale (the calculated 
value is 14.6). As d = F, — F, = 4, the difference is insignificant. 

This procedure, although slightly lengthier than the one previ- 
ously described, may be easily understood and used by those with 
meager mathematical knowledge. Its accuracy depends on the 
size of the graph in relation to N. A straight edge and the series of 
circles will aid in locating P and in reading its length in the scale at the 
bottom. The radius of the largest circle is N+/2 and that of the 
smallest one is approximately equal to ~/N. When the number of 
items to be validated is large, the method is more economical than 
through the use of Edgerton’s! tables giving the value of ay, as the 
p’s are not usually available, and the value of og must be calculated. 

The graph may be used for several problems if a permanent ink 
drawing showing the circles is made, and the principal circle and the 


tances have been laid using a scale times the scale of the 


1 Edgerton, H. A. and Paterson, D. G.: ‘‘ Table of Standard Errors and Probable 
Errors of Percentages for Varying Number of Cases.” The J. of Applied Psychol., 
Vol. x, September, 1926, pp. 378-391. 


294 The Journal of Educational Psychology 


numbers are marked in pencil to suit a particular problem, and then 
erased and marked again for another problem. 
When N, # Nz», formula (2) becomes 





¥ ¢.- aa = VF\(N,; — Fi) + RF 2(N2 — F2) (2a) 
where R = N,i/N2. If two circles with diameters, respectively, 
equal to N, and R”*N, are designated as the principal circles, F, 
plotted to a scale R” times the scale of F; and the value of d set 
equal to (F; — RF»), the graph will yield results approximately equal 
to those calculated through formula (2a). 





in 
(si 
th 


SOI 
Zer 


eff. 








THE RELATION OF SCHOLASTIC APTITUDE TO 
“WITHDRAWAL” PERSONALITY 


WALTER F. ST. CLAIR 


Temple University 


The relationship between scholastic aptitude, scholastic achieve- 
ment, and personality traits has been reported by many investigators. 
Stagner,' in summing up the results of previous studies, states that 
‘objective measures of personality show no linear relationship to either 
academic aptitude or academic achievement.”’ His study confirms 
this conclusion but indicates that personality influences achievement 
in an indirect way by affecting the degree to which use is made of an 
individual’s potentialities. 

The present study reports some findings secured from the records of 
freshmen at Temple University. The instruments used in measure- 
ment of personality and scholastic aptitude were the Bernreuter 
Personality Inventory and the Thurstone Psychological Examination. 
Although the Bernreuter Personality Inventory was originally con- 
structed to measure four personality traits, subsequent investigation? 
has indicated that the scales which are purported to measure neurotic 
tendency and introversion in reality measure the same trait.* The 
measures considered in this study are the percentile rank secured by 
individuals for neurotic tendency, self-sufficiency, and dominance 
(subsequently referred to as B1-N, B2-S, and B4-D, respectively) in 
their relation to percentile rank in scholastic aptitude which hereafter 
shall be designated as SA. 

One method of examining the inter-relations between these per- 
sonality traits and scholastic aptitude is displayed in Table I. These 
zero order and partial correlations present a view of the statistical 
effect of combining or subtracting trait scores. 

The zero order correlation coefficients reported in Table I are very 
similar to the results secured by other investigators, and indicate that 





1Stagner, Ross: “The Relation of Personality to Academic Aptitude and 
Achievement.”’ Journal of Educational Research, Vol. xxv1, May, 1933, pp. 648- 
660. 

* Bernreuter, Robert G.: ‘‘The Imbrication of Tests of Introversion-Extro- 
version and Neurotic Tendency.” Journal of Social Psychology, 1934, pp. 184- 
199, 

* The similarity between the B1-N scale and the B3-J scale is indicated by 
r = .94 secured in the present study. 


295 





296 The Journal of Educational Psychology 


the conclusions reached by Stagner and others concerning the linear 
relationship between personality traits and scholastic aptitude are 
valid. However, the relationship may be examined in greater detail by 
the method of partial correlation. Examination of the partial correla- 
tion figures reported in this table describe certain tendencies which 
suggest that further investigation may be profitable. The following 
observations may be made: 


TABLE I.—CoOEFFICIENTS OF CORRELATION SECURED FOR Six HunpDRED Eicury- 
EIGHT FRESHMEN STUDENTS 














B2-S B4-D SA 
_.41 ~ 82 08 
(3)! 00 (2) —.26 (2) 00 

B1-N (4) —.44 (4) —.83 (3) 01 
(34) 00 (24) —.28 (23) —.08 
suciadateaas 50 19 
-. Deere (1) 81 (1) 25 
BIS, (4) .52 (3) 26 
Dees (14) «34 (13) 29 
pers Mere ~.09 
alain Cb  Lewlatsias (1) —.06 
a a ee Tinie (2) —.27 
Pe, Tae (12) —.15 














1 (3) indicates that the coefficient of correlation is the partial correlation figure 
secured when B4-D is partialled out. In the same manner (34) indicates that 
B4-D and SA have both been partialled out. The following designations are 
used: (1) = B1-N; (2) = B2-S; (3) = B4-D; (4) = SA. 


(1) SA apparently does not affect the relationships between per- 
sonality traits as measured by the Bernreuter Inventory. When SA 
is partialled out, the correlations between B1-N, B2-S and B4-D are 
practically the same as the zero order correlations. 

(2) The only noticeable correlation between SA and a personality 
trait is the correlation with B2-S, and this is somewhat increased when 
either or both of the other personality traits are partialled out. The 
zero order correlation .19 is increased to .29 when B1-N and B4-D are 
partialled out. 

(3) When B2-S is partialled out, the negative relationship between 
B4-D and SA is increased from —.09 to —.27 and the negative correla- 
tion between B4-D and B1-N is decreased from —.82 to —.26. 





ey ~~ ae) ooh ee 








ure 
hat 
are 


eT- 
SA 


are 


lity 
"hen 
The 
) are 


ween 
rela- 





Scholastic Aptitude and ‘‘ Withdrawal” Personality 297 


(4) When B1-N is partialled out, the positive correlation between 
B2-S and B4-D is decreased from .50 to .31. 

(5) The negative relationship between B1-N and B2-S is apparently 
caused by B4-D because the removal of this factor results in r = .00. 

The above observations point to an interdependence of personality 
traits, particularly in B2-S and B4-D relationships. It is evident that 
the technique of partial correlation does not reveal a significant relation 
between SA and single personality traits which has not been apparent 
through other methods. Although the correlation between B2-S and 
SA is not statistically significant, it is sufficient to indicate a need for 
further investigation. 

In the study of personality by means of inventories there has been 
a decided tendency to consider ‘“‘personality”’ as being composed of 
separate traits which can be designated and thus classify an individual 
by means of separate measures. This procedure appears to be 
contrary to a clinical approach, in which situation there is an attempt 
made to view the personality as a whole as the result of a balance 
existing at a particular time between habit patterns which are in force. 
These habit patterns are the result of previous experience, and can be 
expected to operate in situations which are similar in nature to those 
met previously. It is evident that this type of analysis presents 
difficulties in the field of measurement. The approach which most 
closely approximates this clinical procedure in the testing program is 
that used in the Bernreuter Personality Inventory through which more 
than one type of habit pattern is investigated by one set of responses. 
There is a tendency, however, to define and treat these response 
patterns as indications of distinct personality traits and thus, perhaps, 
to lose the advantage of consideration of the “‘whole”’ personality. 
Classifications of personality into types should be done only after a 
consideration of the nature of the responses as they refer to the different 
aspects of personality. 

The response which an individual will make in a social situation is 
dictated primarily by the emotional content which similar situations 
have had previously. Lack of success in social situations germinates 
unpleasant emotional responses, and conditions the individual to react 
in a similar manner in the future. The definitions of ‘neurotic 
tendency”’ and “‘introversion,’”’ based on clinical experience, differ, yet 
Bernreuter’s! study of the instruments which have been designated as 





1 Bernreuter, Robert G.: ‘‘The Imbrication of Tests of Introversion-Extro- 
version and Neurotic Tendency.”’ Journal of Social Psychology, 1934, pp. 184-199. 


298 The Journal of Educational Psychology 


measures of these two separate traits indicates that the terms could be 
used interchangeably as they apply to present measurement. How- 
ever, Bernreuter! defines neurotic tendency—“ Persons scoring high on 
this scale tend to be emotionally unstable’’; and defines introversion-— 
‘Persons scoring high on this scale tend to be introverted; that is, they 
are imaginative and tend to live within themselves.”’ These are both 
definitions which designate persons who probably have had unfavor- 
able experiences in social adjustments, and have unpleasant emotional 
responses linked with these situations. ‘The emotional responses are 
fundamentally the same, but the pattern of behavior is different. The 
difference may conceivably be explained as a defensive adjustment 
which has been set up as a result of a secondary factor which one is 
inclined to consider as intelligence even in the absence of experimental 
evidence. 

Previous articles? have pointed out the desirability of interpreting 
the Bernreuter Personality Inventory in terms of profile scores, and 
indicated as the profile of one personality type a high B1-N score, low 
B4-D score, and B2-S score which is fifteen or more points higher than 
B4-D. This profile of scores appears to delineate a personality type 
which we may designate as ‘ withdrawal,” indicating lack of success in 
associations with other persons and a consequent withdrawal from the 
social group. Another profile which appears to indicate a personality 
type consists of a high B1-N score, low B4-D and B2-S scores. This 
type may be called “‘dependent’’ as a tentative classification, indicating 
a person who follows group activity because of lack of ability to stand 
alone. 

It should be understood that, as in all personality traits, these types 
of personality may vary in degree, and any limits which are set to mark 
off the profiles at present are arbitrary. The limits which are set do 
not necessarily mean that the personality type is not present in a lesser 
degree when the combination of scores does not reach this particular 
definition. 





1 Bernreuter, Robert G.: Manual for the Personality Inventory. Stanford 
University Press, 1935. 

2St. Clair, W. F. and Seegers, J. C.: ‘‘Certain Aspects of the Validity of the 
Bernreuter Personality Inventory.” Journal of Educational Psychology, Vol. 
xxvill, October, 1937, pp. 530-540. 

St. Clair, W. F. and Seegers, J. C.: ‘‘Certain Aspects of the Validity of the 

F Scores of the Bernreuter Personality Inventory.”’ Journal of Education Psy- 
chology, Vol. xxrx, April, 1938, pp. 301-311. 





be Be Be BD) 


Bl 


Bl 








eS 
rk 
lo 
er 
ar 


ord 


the 
Tol. 


the 
»sy- 





Scholastic Aptitude and ‘‘ Withdrawal’ Personality 299 


From a card file of freshman scores, one hundred cases of each profile 
type have been picked at random using the following score delineations: 

Profile I: B1-N = 70 plus; B4-D = 30 minus; B2-S = 15 plus 
higher than B4-D 

Profile II: B1-N = 70 Plus; B2-S = 30 minus; B4-D = 30 minus 


: M,— M —— ; 
Using the formula rs. = — * xX - a biserial correlation was 





o 
computed to detremine the relationship between these profiles and SA. 
A significant relationship is indicated between Profile I and SA by 
Tre. = -401 + .04. The variant in the two profiles is the size of the 
B2-S score in its relation to the B4-D score. 

The following discussion presents the results of further investigation 
of the relationship between profile type of personality andSA. Meas- 
ures of central tendency and dispersion are displayed for groups of 
students who have various combinations of personality trait scores. 


TaBLE II.—Merans, MEDIANS, AND STANDARD DEVIATIONS OF SCHOLASTIC 
APTITUDE ScORES OF FRESHMAN STUDENTS 








Number 
Classifications of Median | Mean SD 
cases 
NS on oe eReader heee ees 1098 57.6 56.8 +26.5 
nt weehewaveduneeden 324 60.9 58.7 +27.2 
cc ctebabaeceenen vee 113 65.7 63 +27.3 
sd cs akcheedekeeee near 99 52.8 52.5 +27.6 
B2-S & 
B4-D = 20 
minus 
cake yds enka seis 121 73.7 68.5 +24.8 
B2-S = 15> 
B4-D 
Ns ct aeadcauxaweens 72 79 75.6 +21.1 
B2-S = 30> 
B4-D 
a ain goa iwdiew ead aes 45 80.8 70.1 +26.6 
B2-S = 15> 
B4-D 
ee tk cweccdcennseeee 29 87 78.5 +19.3 
B2-S = 30> 
B4-D 
Se IE. cc cawetucdevenees 122 67.3 63.6 | +24.1 
B2-S = 15> 
B4-D 

















300 The Journal of Educational Psychology 


The evidence presented in Table II again indicates a relationship 
between Profile I and SA through a very noticeable difference in 
averages obtained in the groups having this combination of personality 
scores. The mean score for one hundred twenty-one cases of Profile I 
is 68.5, which is 11.7 above the niean of the entire class, and the median 
of 73.7 is 16.1 above the median of the entire class. It is also signifi- 
cant that as the limitations of Profile I are increased, and the personal- 
ity type becomes more apparent, the average scholastic aptitude of 
these groups is considerably increased (mean = 78.5; median = 87). 
It is apparent that the combination of scores considered above as 
Profile II yields results which are below the general average. The 
group of ninety-nine cases representing this profile have a mean of 52.5 
and a median of 52.8. As has been pointed out previously, the differ- 
ence in these profiles as indicators of personality types is found in the 
relative size of the score for self-sufficiency. 

The significance of the differences obtained may be studied more 
thoroughly by the method suggested by Garrett! for determining the 
reliability of-the difference between two averages. The difference 
between the means is divided by the standard error of the difference, 
and with the aid of a table of probability integrals the significance of 
the difference between the central tendencies and deviations is trans- 
lated into certainty of occurrence that the selected groups will differ 
from the total group. 

Table III indicates that the reliability of the differences of the 
mean SA scores for the two profiles which have been used is undoubt- 
edly greater than chance would allow. An index of reliability greater 
than three is considered statistical evidence of reliability. There is 
not one chance in ten thousand that the differences secured by using 
Profile I could occur by chance, and there are very few chances that 
the low average of Profile II is not reliable. It is also important to 
note that the group which has B1-N scores below seventy and B2-S 
scores fifteen points greater than B4-D has a very reliable difference 
in means, because it is very possible that the limitations used in select- 
ing Profile I do not completely delineate the particular personality 
type which is under consideration. The relationship between the 
B2-S and B4-D scores is undoubtedly of greater significance than the 
size of the B1-N score. As the profile used to select the personality 
which is called ‘‘ withdrawal’? becomes more pronounced and more 





1 Garrett, Henry E.: Statistics is Psychology and Education. Longmans Green 
& Co., New York, 1926, pp. 128-133, p. 91. 











the 
bt- 
ter 
» is 
ing 
hat 
; to 
2-8 
nce 
ect- 
lity 
the 
the 
ulity 
nore 


yreen 





Scholastic Aptitude and “ Withdrawal’ Personality 301 


extreme, it is evident that there is a higher average SA and that these 
are statistically reliable, even though the reliability of these differ- 
ences is affected by the smaller size of the group. 


TaB_E II].—DIFFERENCES IN MEANs, STANDARD ERRORS OF DIFFERENCES, AND 
RELIABILITY OF DIFFERENCES IN SCHOLASTIC APTITUDE ScorREs 








— | Relia- 
Number| °2°°S of Standard, Index | bility 
Classifications of a = of | chances 
from of dif- relia- |. 
saan total | ferences| bility |’ oon 
thousand 
group | 
OE Se 324 1.9 1.71 1.11 8,665 
a 113 6.2 2.69 2.30 9,893 
B1-N = 70 plus; B2-S and B4-D 99 —4.3 2.88 1.50 9,332 
20 minus 
Bi-N = 70 plus; B2-S = 15 > 
EAE en ae ee 121 53.7 2.39 4.89 10,000 
Bi-N = 70 plus; B2-S = 30 > 
TT fa ast chic avd iain 60d ad 72 18.8 2.60 7.24 10 ,000 
Bi-N = 90 plus; B2-S = 15> 
AEE ee 45 13.3 4.04 3.29 9,995 
B1-N = 90 plus; B2-S = 30 > 
a elie sh ht ts ap Seth cla taster 21 21.7 3.75 5.78 10 ,000 
Bi-N = 70 minus; B2-S = 15 > 
RS tia acento 6 ed bed 122 6.8 2.32 2.92 9 ,982 




















SUMMARY AND CONCLUSIONS 


This paper has attempted to throw some light on the relationship 
between personality as measured by the Bernreuter Personality Inven- 
tory and scholastic aptitude as measured by the Thurstone Psycho- 
logical Examination. Through the usual methods of correlation it 
has been found that the conclusions reached by certain previous 
investigators are substantiated, and that there is no linear relation- 
ship between personality traits and scholastic aptitude as so measured 
when personality is considered in separate traits. Examination of 
results secured by partial correlation technique yields convincing 
evidence of interrelationship between personality traits: When domi- 
nance is partialled out the negative correlation between neurotic 
tendency and self-sufficiency drops from —.41 to .00; when self- 
sufficiency is partialled out the negative correlation between domi- 
hance and neurotic tendency drops from —.82 to —.26; when 


302 The Journal of Educational Psychology 


neurotic tendency is partialled out the positive correlation between 
self-sufficiency and dominance is decreased from .50 to .31. There 
is a correlation of .29 between scholastic aptitude and self-sufficiency 
when neurotic tendency and dominance are held constant. 

Two profiles have been tentatively delineated. Cases included 
in Profile I (withdrawal) have scores as follows: B1-N = 70 plus, 
B4-D = 30 minus, 'B2-S = 15 plus higher than B4-D. Profile II 
(dependent) is characterized by scores: B1-N = 70 plus, B2-S and 
B4-D = 30 minus. 

The biserial correlation between samples of Profile I and Profile 
II results in a definite relation between Profile I and scholastic aptitude. 
Although this correlation coefficient of .401 is not particularly large, 
it is ten times the probable error. 

The further investigation of the differences between the means of 
samples of Profiles I and II and the mean of the total freshman group 
results in more conclusive evidence that differences in scholastic apti- 
tude exist between groups selected on the basis of personality types 
and that the differences in the means cannot be attributable to chance, 
and would occur consistently. 

While no linear relationship can be established between isolated 
personality traits, the evidence of the paper indicates that when per- 
sonality is viewed as a whole or in “‘profile”’ there is a definite rela- 
tionship with scholastic aptitude. The profile which is positively 
related to scholastic aptitude is that which has previously been desig- 
nated as indicative of “‘ withdrawal” and the profile which is negatively 
associated with scholastic aptitude is that referred to as “‘dependent.” 





—, -ak —" oe fet —" ae ps 


— ~~ 5 


t 
b 
a 
F 








ig- 


aw 





GUIDANCE AND TRANSFER IN PART AND WHOLE 
LEARNING OF THE DISC TRANSFER PROBLEM 


THOMAS W. COOK 
Acadia University, Wolfville, N. S., Canada 


The structure of Peterson’s disc transfer puzzle is such that each 
problem (except the two-block) can be divided into two simpler prob- 
lems, plus one move. Thus the four-block problem may be solved 
in the minimum number of fifteen moves by viewing it as made up of 
two similar seven-move problems with one move of the base inter- 
polated between the end of the first group and the beginning of the 
second group of seven moves.* 

It is accordingly possible to apply to the disc transfer problem the 
standard technique for investigating the relative economy of whole 
and part learning, and thereby advance a step toward finding whether 
rational problems can be most economically attacked as a whole or 
piecemeal. Moreover, since the principles involved in the disc trans- 
fer puzzle are similar for problems of any size, such an investigation 
might be expected to yield some data on the réle of generalization in 
transfer, and the manner in which such generalization might be 
facilitated by suitable instructions 

In the present experiment three groups were used. Group I 
(twenty-eight girls, thirty-five boys) learned the four-block problem 
as a whole by the method described in a previous paper.' Group II 
(sixteen girls, twenty-nine boys) used the part method; 7.e., they first 
learned to move three blocks from A to C in seven moves.* They 
then mastered a second three-block problem, but in this instance 
beginning at C and ending at B. Finally they were set the same task 
as Group I, to transfer four blocks from A to B in fifteen moves. 
For Group III (eighteen girls, twenty-eight boys), the instruction 
group, the procedure was identical with that for Group II, except that 
after the completion of the second three-block problem, and before 
beginning the four-block problem, the subject was asked to read the 
following typewritten instructions: 


Notice that the four-block (fifteen moves) problem can be broken up into 
the first two problems, A-C and C-B (seven plus seven equals fourteen moves) 
plus one move (base from A to B) by: 


1. Disregarding the base and moving the three upper blocks from A to C. 





* For the significance of the letters A, B, and C, see 1, p. 288. 
303 


304 The Journal of Educational Psychology 


2. Then moving the base from A to B. 
3. Finally moving the three blocks now on C to B. 


After a subject stated that he understood the instructions, they 
were rehearsed orally by the experimenter. Otherwise, the procedure 
was similar to that used in the earlier investigation. A slight change 
was made in the apparatus: The largest, or base block, used only in 
the four-block problem, was painted red instead of black. All sub- 
jects were students at Acadia University. 


EXPERIMENTAL DATA 


(A) Whole versus Part Method.—Table I shows only small and 
unreliable differences between the relative economy of the whole and 
part methods. Neither in errors, moves, time, nor trials are there 









































TABLE I 
' Time in 
Errors Moves Trials 
seconds 
Problem 
Whole method, Group I 
Pe rarer 66.5 | 188.9 | 558.0 | 8.2 
Part method, Group II 
A to C (three-biock)................. 9.1 26.1 124.6 2.4 
C to B (three-block)................. 2.2 12.6 32.9 1.5 
pT rere errr ree 50.0 128.4 415.0 §.2 
EE a ee ee ee 61.3 167.0 574.4 7.2 
Instruction method, Group III 
A to C (three-block)................. 8.6 25.7 114.7 2.4 
C to B (three-block)................. 2.6 13.0 35.6 1.5 
A to B (four-block).................. 16.2 62.7 162.8 3.1 
Ee ee Pee 27.4 101.4 313.1 §.1 

















significant differences between the averages for Group I and the 
combined data for the three problems solved by Group II. The 
instruction method, however, is markedly superior to either of 
the other two methods. Group III average (per subject) for all three 
operations only 27.4 errors, 101.4 moves, 313.1 seconds, and 5.1 trials, 
as compared with 66.5 errors, 188.9 moves, 558.0 seconds, and 8.2 





ew & jj we wa 


fur 


ing 








id 


re 


— i OU 


- the 

The 
or of 
three 
rials, 


d 8.2 





Guidance and Transfer in Learning 305 


trials for Group I, and slightly smaller values for Group II. The 
reliability of this superiority of the instruction method can be esti- 
mated most satisfactorily from the differences between Group II and 
Group III on the four-block problem, since these two groups used 
identical procedures and made almost identical records on the two 
three-block problems which constitute the parts of the four-block 
problem. ‘Thus, any reliable difference between their performances 
on the four-block problem must be attributed to the instructions given 
Group III. For errors, time, and trials, the o diff. between the aver- 
ages for Group II and Group III on the four-block problem are 7.86, 
52.8, and .7, respectively, giving critical ratios of 4.4, 4.8, and 3.0. 
It seems unnecessary to compute the o diff. for the moves measure, 
since the latter is a compound of the error and trial measures. 

A further bit of evidence tending to show that Group III is sig- 
nificantly superior to the other two groups on the four-block problem 
is that while only one subject of sixty-three in Group I and three 
subjects of forty-five in Group II solved the four-block problem with- 
out error on the first trial, nineteen of the forty-six subjects in Group 
III performed this feat. 

(B) Transfer.—Besides the just-described influence of written and 
oral guidance upon the solution of the four-block problem by Group 
III, there is evidence of transfer from the two three-block problems 
to the four-block problem in Group II, and from the first (A to C) 
to the second (C to B) three-block problem in Group I and Group II. 
The case for transfer to the four-block problem in Group II is not 
particularly strong, as the differences between the records for the 
four-block problem with Group I and Group II (for example, 66.5 as 
compared with 50.0 errors) are not statistically reliable. On the other 
hand, the differences between the averages for the three-block prob- 
lems A to C and C to B are more significant. The ninety-one subjects 
in Groups I and II solve the first problem with 8.8 errors, 25.9 moves, 
119.7 seconds, and 2.4 trials, and the second problem with 2.4 errors, 
12.8 moves, 34.3 seconds, and 1.5 trials. The error averages have a 
o diff. of 1.47, with a critical ratio of 4.2. The distributions for 
moves, time, trials, and error scores are so similar that presentation of 
further critical ratios seems superfluous. 


DISCUSSION 


The relation between amount of material and difficulty of learn- 
ing is one important factor affecting the relative economy of the whole 


306 The Journal of Educational Psychology 


and part methods. In the present experiment with the disc transfer 
problem the marked difference in difficulty between the three-block 
and four-block problems! gives the part method a great initial advan- 
tage. With Group II, however, this advantage is offset by the fact 
that subjects get relatively little help in the combining act from their 
previous experience with the two parts. That the failure of the sub- 
jects in Group II to transfer their experience with the three-block 
problems is a consequence of their inability to see the smaller problems 
as true (and similar) parts of the larger problem, is suggested by the 
degree to which Group III profited by the instructions given. For 
these instructions were specifically directed toward demonstrating 
that relation in as clear and concrete a fashion as possible. 

The need of comprehension of the pattern for a solution of the dise 
transfer problem is clearly indicated by the large percentage of errors 
in the initial phase of the problem. We have seen that the three-block 
and four-block problems may be divided into two parts consisting of 
three or seven moves each, with one move of the base between the 
end of Part I and the beginning of Part II. Disregarding the inter- 
mediate move of the base, which offered no difficulty to any subject, 
we find that almost all errors occur in Part I. In fact, with the three- 
block problem only two of one hundred fifty subjects made any errors 
in the second part. With the four-block Problem I at first recorded 
only total moves, and total time per trial. But more detailed data 
obtained from seventeen subjects in Group I show a first trial average 
of 16.0 errors in Part Iand 1.4 errorsin Part II. For the fifteen-move 
problem with Group II, the first trial averages are 17.0 for Part I 
and 1.7 for Part II, and the average number of errors per subject for 
all trials is 43.7 for Part I and 7.5 for Part II. 

For the instruction group (III), however, the first trial with the 
four-block problem shows 3.6 errors in Part I and 3.5 errors in Part II, 
and eighteen of the twenty-seven subjects who failed to solve the 
problem on the first trial made more errors in Part II. Moreover, 
when we compare the average errors for the whole, part, and instruc- 
tion methods for those subjects who took at least three trials to solve 
the four-block problem, we find that the whole method shows 17.6, 
18.4, and 10.8 errors; the part method 15.7, 18.4, and 8.2 errors, and 
the instruction method 4.3, 10.2, and 6.9 errors on the first three trials. 
Thus in the whole method the number of errors decreases from trial to 
trial, while the part method gives a small increase and the instruction 
method a marked increase in trial 2 over trial 1. Evidently the prob- 





a eo Aas ee ot ee 


ww 


tw 
su 


of 
act 
for 
Ins 


prc 








r 


3€ 
rs 


of 
he 
T= 
ct, 
pe- 
ors 
led 
ata 
age 
ove 
t I 
for 


the 
t II, 

the 
ver, 
ruc- 
solve 
17.6, 
and 
rials. 
ial to 
ction 
prob- 





Guidance and Transfer in Learning 307 


lem must clear up at once or the influence of the instructions tends to 
disappear. 

One cannot, of course, conclude from these results that verbal 
guidance plays an equivalent réle in the whole-part problem with 
other types of material. I have argued previously that one factor in 
the greater relative economy of the part method in our studies of 
maze learning, as compared with the results of Hanawalt and Pech- 
stein, is the use of visual and verbal guidance to inform the subject 
that each initially-learned section was part of a larger whole.2 Other- 
wise, unless each part has some distinctive local sign it will not be 
recognized (as Hanawalt’s results indicate) when subjects who have 
previously learned the parts separately are set the task of tracing 
the total pattern. With meaningless material it may be that such 
redirection of mental set is the limit of usefulness of verbal guidance. 
But in the disc transfer problem words have an additional function. 
They serve to indicate a similarity between two sections of the 
puzzle. That is, they aid the subject in discovering a relation of 
another sort from the belonging and temporal sequence which exhaust 
the significance of the word part as applied to meaningless materials. 

Since words are our chief tools in rendering the comprehension of 
relations explicit and stable, and in utilizing the understanding of 
simpler relations in the discovery of the more complex, there would 
seem to be no limit (except the relations present in the material) to the 
possibilities of usefulness of verbal guidance in the solution of rational 
problems. With this hypothesis the classical experiment of Judd and 
Scholchow* and the more recent work of Waters,® as well as the 
present investigation, are in accord. On the qualitative side, methods 


in vogue in the schoolroom add support to the view and illustrate the 
practical value of further investigation. 


SUMMARY 


The four-block disc transfer problem can be viewed as made up of 
two three-block problems, plus one move of the base. When two) 
such three-block problems are successively learned there is a large/ 
amount of transfer from the first to the second, and a small amount) 
of subsequent transfer to the four-block problem. The three learning’ 
acts (part method), however, give total scores about equal to those 
for the four-block problem with naive subjects (whole method). 
Instructions explaining the relation of three-block to four-block 
problems, when introduced after mastering the former and before 


308 The Journal of Educational Psychology 


attempting the latter, greatly facilitate its solution. The results 
indicate that verbal guidance may be of considerable assistance 
in the solution of the disc transfer problem. 


REFERENCES 


1. Cook, T. W.: ‘Amount of Material and Difficulty of Problem Solving, II. 
The Disc Transfer Problem.” Journal of Experimental Psychology, Vol. xx, 
1937, pp. 288-296. 

2. Cook, T. W.: “Factors in Whole and Part Learning a Visually Perceived Maze.” 
Journal of Genetic Psychology, Vol. 11, 1936, pp. 3-32. The relations between 
the work of Cook, Hanawalt, and Pechstein are discussed on pages 26-28. 

3. Judd, C. H.: “The Relation of Special Training to General Intelligence.” 
Educational Review, Vol. xxxvi, 1908, pp. 36-39. 

4. Seashore, C. E. and Seashore, R. H.: Elementary Experiments in Psychology. 
New York: Henry Holt and Co., 1935, p. 237. The authors subject the disc 
transfer problem to detailed analysis. 

5. Waters, R. H.: “The Influence of Tuition upon Ideational Learning.” Journal 
of General Psychology, Vol. 1, 1928, pp. 534-549. 





— — pete ry me an — mi 


— 
~s 


JO SB pmo 


+ 
Pe 


W 


Gr 











A FURTHER STUDY ON THE ROLE OF THE BASAL 
METABOLIC RATE IN THE INTELLIGENCE OF 
CHILDREN 


RALPH T. HINTON, JR. 


Mooseheart Laboratory for Child Research, Mooseheart, Ill. 


In the October, 1936, issue of this Journal! the author reported on 
a study in which he correlated the scores made on two intelligence 
tests with the basal metabolic readings of ninety school children of 
both sexes between the ages of five and fifteen years. The present 
study is an outgrowth and an enlargement of the previous one. 


PROCEDURE 


In all, two hundred subjects equally divided between boys and 
girls were used. There were twenty cases at each year level. The 
age range was between six and fifteen years inclusive. There was no 
conscious selection of cases, and the only requirement was that each 
child pass a very thorough physical examination. This was done to 
eliminate any factor other than a thyroid disturbance that might have 
a bearing on the results, for, in the first place, it is universally recog- 
nized that, other things being equal, the basal metabolic rate is one 
of the most reliable criteria of the activity of this gland. In this way 
it was hoped that the results would become more meaningful than if 
several other factors were involved. In the second place, it was 
intended to treat several children with low intelligences and metabo- 
lisms with thyroid and to compare these results with those of the 
present investigation. This was done and will be reported on at a 
later date. 

The conditions of the experiment itself were quite simple. Most 
of the children came from private schools in and around Manteno and 
Kankakee, Illinois, with a few from an orphanage in Evanston, Illinois. 
The only mechanically controlled instrument was a well-known com- 
mercial machine which measures calorimetry indirectly. The mental 
examinations consisted of the Stanford Revision of the Binet-Simon 
Mental Test and the Grace Arthur Point-Performance Scale. 

The procedure of experimentation was as follows: The children 
whose metabolisms were to be tested on any particular morning were 





1 Hinton: ‘“‘The Réle of the Basal Metabolic Rate in the Intelligence of Ninety 
Grade School Students.” Jour. of Educ. Psych., Vol. xxvu, 1936, pp. 551-554. 
309 


310 The Journal of Educational Psychology 


kept in bed until the examiners arrived. No breakfast was permitted 
and they were allowed to sleep, if possible, until the actual testing 
began. Some were able to sleep that long, others were not; but, in 
either event, all were compelled to stay in bed without physical 
exercise. This was done to insure complete basal conditions. In 
spite of the fact that many authorities do not regard a certain amount 
of exercise as injurious to the results of such a test, the author limited 
it as strictly as possible. 

Every child was given three metabolism tests. The first, which 
was employed to accustom the subject to the machine, was discarded 
and the average of the last two was judged to be the basal metabolic 
rate. By beginning very early, it was possible to examine several 
children in one period of time, but no more than six children were ever 
tested on any particular morning. 

As soon as the metabolism tests were finished, the children were 
allowed to dress and have breakfast. After this, the intelligence tests 
were given. As a possible check against such factors as preconceived 
ideas, prejudices, or previous knowledge, the following procedure was 
worked out: Two assistants gave the metabolism tests, another 
administered the Performance, while the author tested the children on 
the Binet. 


RESULTS 


TaBLE I.—CorRELATIONS, MEANS, and STANDARD DEVIATIONS FOR THE TOTAL 
GROUP AND THE MEANS AND STANDARD DEVIATIONS FOR THE SEPARATE 
SEXEs FoR (1) Binet, (2) PERFORMANCE, AND (3) METABOLISM SCORES 









































Total group N200 . Males N100 Females N100 
1 2 3 M o M o M o 
1 100.10 | 21.50 | 100.45 | 22.20 | 99.75 | 20.80 
2 pee ee 102.05 | 24.50 | 101.15 | 24.90 | 102.95 | 24.10 
3 .706 | .741 —4.30 | 13.00 | —4.15 | 12.95 | —4.45 | 13.00 
DISCUSSION 


It will be noticed that the correlations of Table I correspond closely 
That is, when we employed 
ninety subjects, the r between Binet IQ’s and metabolism scores was 
.736(PE,.032). 


to those of the previous investigation. 


The present r was .706(PE,.023). 


Moreover, the 


first Metabolism-Performance r was .661(PE,.040), while the r of the 





en 
po 
ear 
poi 
dis 














Réle of the Basal Metabolic Rate 311 


present study was found to be .741(PE,.021). In general, then, we 
may say that the results of this investigation bear out those of the 
previous one. The correlation between the Binet and Performance 
scores was .515 + 036. 


TaBLE II].—CoRRELATIONS BETWEEN METABOLISM AND BINET IQ’s AND BETWEEN 
METABOLISM AND PERFORMANCE IQ’s For Eacu YEAR LEVEL BETWEEN 
Srx AND FIFTEEN YEARS 











BMR and Binet BMR and Performance 

Year 
r PE, r PE, 
6 . 796 .057 .765 .066 
7 .799 .056 .791 .058 
8 . 785 .061 .781 .061 
9 .778 .062 .831 .048 
10 . 768 .064 .793 .058 
11 .715 .077 .757 .067 
12 .615 .098 .658 .089 
13 .588 .103 .514 .116 
14 .566 .107 .698 .080 
15 .528 .112 .483 .121 

















At that time the author was not inclined to make any sweeping 
predictions about the relationship between basal metabolism and 
intelligence of children who were negative to any pathological con- 
dition except a possible thyroid disturbance. But since the present 
correlations closely follow the original figures, with no significant 
difference between them, there no longer seems to be a great deal of 
doubt. The small number of former cases might have been an 
influencing factor, but even when the group of subjects was extended 
to include two hundred there was no appreciable change in the results. 
Even the change due to correlation between mental age and basal 
metabolism with chronological age held constant was not enough 
to change greatly the results. 

Moreover, when one considers that the group of subjects was 
entirely unselected, the results assume even more meaningful pro- 
portions. As we have pointed out, the only requirement was that 
each child be in good physical health. This naturally brings up the 
point of why we were so careful to eliminate factors other than thyroid 
disturbances. If the subjects had been selected regardless of their 
physical health, we might have secured the same metabolic scores, but 





312 The Journal of Educational Psychology 


they would not have meant a great deal. We know that other con- 
ditions have a bearing on this rate, but we also know that no other 
disturbance gives the same basal reading day after day and month 
after month; these other factors cause a fluctuation, rather than allow- 
ing it to remain constant.! This being so, an unexamined child would 
not have given a true indication of the relation between the basal metabo- 
lic rate and intelligence, and we would have been correlating changing 
against stable scores. Therefore, selecting only those subjects who 
had no other afflictions, we were able to compare a permanent charac- 
teristic, the IQ, with the only other factor that gives a constant 
metabolic rate—the activity of the thyroid gland. 

Referring back to Table II, which deals with the separate year- 
level correlations, we notice a rather interesting fact in connection 
with both mental tests. In the case of the Binet, there is a high 
correlation with metabolism up to, and including, ten years of age. 
Moreover, there is a marked tendency for this relationship to decrease 
gradually. At eleven years, however, there is a sharp break in cor- 
relation—and from then on the 7r’s become markedly less in size. 
Previous to this time there was never a difference of more than .015 
points between any of the coefficients, but from then on we see that 
the differences are greatly increased, sometimes as much as .100. 
The Performance 1Q—basal metabolism relationship is not as clearly 
defined, but the same general tendency is present, even as far as the 
same “break’’ at eleven years is concerned. 

The explanation of such a condition is not easy, but it would appear 
that here we are dealing with the many variables which enter in to 
upset and change physical and glandular functionings at adolescence. 
In general, we know what happens to the glandular balance of the 
organism when this period of life is reached. Consequently, the 
metabolism of the body may change very radically in a short period 
of time and what was previously true of the physical functionings no 
longer holds good. Therefore, it does not seem unusual that there 
should be less connection between basal metabolism and intelligence 
as the individual grows older. Perhaps this would account for the 
fact that in adults there is no connection at all between these two 
variables. After all, a metabolism test merely tells us the condition 
of the body at a particular time, not how long this condition has 
existed. If a change in BMR occurs at adolescence, as it probably 





1 Sloan, E. P.: The Thyroid. Charles C. Thomas, Springfield, Ill., 1936. 





Wi 
lin 
S1z 


bet 
tes 
age 








re 
ice 


wo 
jon 
has 
bly 





Réle of the Basal Metabolic Rate 313 


does, it may not have lasted long enough to be recorded in the person’s 
mental life. 

With regard to other results, we note that there was no reliable 
difference between male and female test scores, whether it be intel- 
ligence ratings or metabolism scores. In connection with the basal 
metabolic results we notice an interesting fact; namely, that the 
mean metabolic score was —4.30 and that one hundred twenty-six, 
or sixty-three per cent, of the cases fell on the minus side of the 
metabolic chart. At a first glance this might seem a little strange, 
but it ceases to be so when one remembers the particular part of the 
country in which the investigation took place—the area immediately 
connected with the Great Lakes. This is known as the “goiter belt,” 
so there is nothing unusual in the fact that the majority of our cases 
show negative BMR’s. In effect, this bears out the contention of 
medical authorities that the greater part of the population of this 
area have minus metabolisms. If this is as generally true as these 
results seem to indicate, then such a point would be of considerable 
interest and importance to educators in this part of the country. 


CONCLUSIONS 


(1) In spite of the fact that the results of this investigation indi- 
cate a close relationship between the basal metabolic rate and intel- 
ligence of children who are negative to any pathological condition, 
except a possible disturbance of the thyroid gland, the author is well 
aware of the dangers in correlating any two such variables. Never- 
theless, the results are of such a consistent nature that we feel that the 
basal metabolic rate is a factor that should be taken into account in 
the clinical picture of any child. 

(2) In spite of the fact that the statistical relationship between 
metabolism scores and intelligence test results was linear, the highest 
metabolic scores were not associated with the highest IQ’s. The 
upper ranges of IQ’s on both intelligence tests were found connected 
with metabolisms between 0 and +4. After the upper normal 
limit of +10 had been reached, there tended to be a lessening in the 
size of the associated IQ’s. 

(3) There was no significant statistical difference to be found 
between the results of the first correlations (BMR’s and intelligence 
test IQ’s) and the results of the second correlations (BMR’s and mental 
age on the intelligence tests, chronological age held constant). 








314 The Journal of Educational Psychology 


(4) By dividing the subjects into year levels, we found a_higher 
degree of relationship in younger children between intelligence scores 
on the two tests and basal nietabolism than in children of more 
advanced age. Moreover, this correlation shows, on the average, a 
steady decline in relationship as chronological age increases. 

(5) The general average of the basal metabolic rates was inclined 
to be slightly on the minus side of the metabolism chart, somewhere 


in the neighborhood of —5. 
(6) There were no significant sex differences found. 





So 8 oe’ @& ft 











A NOTE ON METHODS OF TEST VALIDATION 
DAVID G. RYANS 


Fulton, Missouri 


The basal importance of the problem of validity for scientific 
method in general or for any specific procedure devised as an approach 
to the understanding of phenomena is apparent. Physical and 
biological evidence stands or falls on the extent to which the instru- 
ments and modes of attack, through which it is obtained, are capable 
of yielding ‘‘true” data. None the less, the efficacy of behavior 
sampling is dependent upon the successful development of valid 
measures, or tests. This has been readily recognized, though often 
neglected in practice, during the brief history of measurement in 
psychology and education. As a result, statistical means for deter- 
mining the validity of instruments employed have properly claimed 
the respect of sincere research workers and testers. 

To say that ‘‘a test is valid when it measures what it purportsto 
measure,”’ while of value, perhaps, for elementary purposes in provid- 
ing a superficial distinction between validity and the related function 
of reliability, is, from a descriptive standpoint, little more than a play 
on words. One must necessarily be more explicit. Some answer to 
the question, ‘‘when does a test measure what it is meant to measure?” 
must be provided if the concept is to have any practical significance 
for the student in the field of measurement. 

The essential viewpoint regarding validity in all science, and, of 
course, in the estimation of behavior traits, is that of experimental 
pragmatism. A fact is said to be true if it is demonstrably practical 
when subjected to experimental controls. Similarly, an instrument or 
method is valid, and measures what it is supposed to measure, if the 
practical consequences are satisfactory in properly controlled situa- 
tions. The researcher in education and psychology, and the statistics 
text, alike, aptly refer to these practical consequences as comprising a 
criterion. 

In establishing test validity the fundamental method has com- 
monly been that of correlation—correlation of test data with those of 
a criterion. By calculating the degree of concomitant change of a 
measure compared with a suitable criterion, the measure’s relative 
validity is determinable. In other words, when a common cause may 
be assumed to be significantly operative both in the results supplied 
by the measuring device and in the data which make up the criterion, 

315 








316 The Journal of Educational Psychology 


the device is said to be a valid indicator of phenomena associated with 
the criterion. 

Criteria employed in attempting to establish test validity are of 
several sorts. First to be noted, and most satisfactory as a crite- 
rion, is objectively viewed behavior, or performance in a situation 
which is known to demand the kind of response which the instrument 
being studied purports to measure. This may be simply illustrated 
by the subject-matter test in, say, a laboratory science. By calcu- 
lating the coefficient of correlation between (1) a test of knowledge in 
the subject and (2) readily observable techniques and results in the 
laboratory, the validity of the test may be ascertained. A second 
criterion, which may be employed, makes use of well defined ratings 
supplied by competent and reliable judges and based upon generalized 
impressions gained from observation in specific situations. Ulti- 
mately, tests of intelligence have as their validation criterion, ratings 
of judges (often teachers). Actually, and more immediately, many 
so-called intelligence tests have been validated through the correlation 
of their results with those of the Binet scale or one of its revisions. 
This is admittedly indirect. But, assuming that the Stanford Revi- 
sion of the Binet, for example, has been shown to be valid, it would 
appear in order to declare other measures which correlate highly with 
the Stanford-Binet also valid. Thus, other instruments of already 
established validity may be acceptable as criteria to be employed in 
the subsequent validation of another device. 

Besides correlation with a criterion, other manners of determining 
the validity of tests are frequently resorted to. When intercorrela- 
tions between a number of tests, all presumably measuring the trait or 
function in question, are available, the validity of a particular measure 
may be estimated by determining either (a) the correlation coefficient 
between the results of that test and the averages of those of the other 
tests combined, or (b) the average of the correlation coefficients 
between that test and each of the other measures. 

Still another validation approach sometimes employed depends 
upon the selection of valid items to make up a test, after item-analysis 
and determinations of the differentiation values of individual items in 
light of total performance have been accomplished. This method is, 
perhaps, especially applicable to examinations in the school and to 
personality questionnaires. 

It is the primary purpose of this paper to take brief note of one more 
means of determining the validity of a measure. The method here 





| ~~ Co Law) => tmce ot Fr 


a i a ee ee! a ee Ce 


sor 
ger 
me. 
me; 
tive 
test 


be ¢ 








a DO ~ Se 


2. Mm 


ng 


, or 
ure 
ent 
her 
nts 


nds 
ysis 
S in 
1 is, 
1 to 


nore 
here 





Methods of Test Validation 317 


suggested makes use of multiple factor analysis techniques applied to 
the inter-correlations of measures which are suspected of being valid 
indicators of a type of behavior. Obviously this is but an extension, 
taking into account newer statistical treatments, of the correlational 
approaches reviewed earlier. It appears, however, to be more thor- 
oughgoing, and, in light of the clustering of traits and individual 
responses, to offer a more inclusive picture of test validity. 


TaBLE I.—Factror LoapinGs INDICATING THE RELATIVE VALIDITY OF MEASURES 
EMPLOYED IN THE Stupy OF PERSISTENT BEHAVIOR! 














Factor 
Measure 
I II 
eee a dk dw bie d koa ee bomeees .38 .19 
Be EE I Dacic ks ccesanvdsesedcnsiwhsccces .52 .10 
rs cicee Ghana kaeweNs oe eee eekee .38 — .33 
et cee hs a bad bie bees wunewas 46 — .15 
5. Inhibited free-association..................000000- .26 — .07 
a a set Sl og a ad we dleeees — .30 — .18 
ncn s peeks tnesedbecakeswees .39 .30 
S. Continuous mental work...........ccccccccccccses .30 — .55 
as oak enwed see ee eeeeeeeeees — .08 13 
oo ok kG kod eeee deans ss .30 — .21 
EE Oe re ee O01 — .58 
cadena es ee de Cele e NL WKs eee kekacer .53 .19 
6 ccknh heehee de heehee these Che ds .30 .69 
14. Persistence Rating I (Paired Comparisons) ......... .89 .20 
15. Persistence Rating II (Graphic).................... .83 .32 
nn 0 5s Sorewa ee ne sek bie h's beeee wee .74 .30 
ETCETERA ee go A a .29 — .03 
Se I IONS 0 cng ccs nnnvidcaeleSacudeas .43 .39 
ably is w oe Cece eee oe de eee Ove ndmaiee ba 3.93 1.96 
atin se celnascace ben tewe suka dasa eaeoNee 0.21 0.10 








Factor analysis is ordinarily thought of as capable of yielding two 
sorts of information: First, indications of the presence or absence of 
general factors, or underlying features common to contributing 
measures; and second, indices of the extent to which a particular 
Measure is weighted or loaded with a factor that is commonly opera- 
tive. It is necessary to consider both of these functions in discussing 
test validation by factor methods. In the first place, measures must 
be employed which have been empirically chosen and accepted as ones 


318 The Journal of Educational Psychology 


demanding the type of response in question. And, for purposes of 
identification of possible factors, as well as for their contributory value, 
criteria (as previously described) should be included among the 
measures entering into the correlation table. These conditions 
having been met, it seems to follow that the factor loadings (of a 
factor identified by the criteria), which result from analysis of the 
intercorrelations, will serve as indicators of the relative validity of the 
measures to which they apply. 

The writer, in a recent attempt to study persistent behavior! and 
to develop a persistence test,2 employed the method of multiple factor 
analysis in determining the comparative validity of sixteen different 
situations, such as continued effort at specific tasks, inhibition of 
reflexes and well-fixed habits, physical endurance, resistance to dis- 
traction and to suggestion, scholastic achievement, and ratings on 
perseverance, as measures of persistent behavior, or continued energy 
release. 

Tables of inter-correlations were set according to the center of 
gravity method of factor analysis, described and outlined by Thurstone 
in his pamphlet entitled, A Simplified Multiple Factor Method.’ 
Persistence ratings and intelligence indices were included in the 
correlational matrix, as criteria of possibly existing behavior factors, 
along with measures which had been chosen for their apparent rela- 
tions to persistent behavior. The results of the multiple factor 
analysis are given in Table I. 

A survey of this table of factor loadings shows the ratings on 
persistence and the intelligence test results to be weighted to an 
appreciable degree respectively with the first and second factors which 
emerged in the study. Thus, it does not seem out of order to call 
Factor I, a persistence factor, and Factor II, an intelligence factor. 
Nor does it appear amiss to designate the validities of the different 
measures which are contributed to by a factor in terms of the factor- 
loadings of those measures. Thus, persistence appears to enter into 
such measures as Persistence ratings, Honor-point ratio, Study log, 
Number of anagrams (obtained), Study time, Suggestion resistance, 
etc. in decreasing amounts. The higher the coefficient, the more 
valid is the measure as indicated by this method. It is noteworthy, 
though incidental, to add that the validity of the test-situations, 
selected in this manner, was borne out in more general behavior pat- 
terns demanding persistence by results subsequently obtained. 

These data and the method of validation employed are presented 
with a word of care. In the first place, factor loadings, or indices of 





2 © = 65 we =e ot OCU hClCUm 


ae 
- 


< Fy =& 


al 
in 
pr 


of 
cr 


ex 
cri 








on 
an 
ich 
all 
or. 
ent 
or- 
nto 
0g, 
1¢e, 
.ore 
shy, 
ons, 
pat- 


ted 
s of 


Methods of Test Validation 319 


validity, will hold good only when the original measures possess suffi- 
cient reliability. Thus, the correlation coefficients comprising the 
table subjected to analysis should, if possible, be ones which have 
been corrected for attenuation. Further, the status of factor analysis 
is not a completely settled matter today. The instability of factors 
and factor loadings with the extension and limitation of the correla- 
tional matrix, and the variance of results when different methods of 
analysis are employed have made for some uncertainty in the field. 
Nevertheless, when certain cautions‘ and controls are observed, there 
is little doubt in the writer’s mind of the usefulness of factor techniques 
both for the isolation of general factors, and for establishing the 
validity of measures under consideration. 


SUMMARY 


Along with other methods, the application of factor analysis 
techniques is suggested as an approach to the validation of behavior 
measures. ‘The procedure includes the following steps. 


1. Selection of a sufficiently large number of measures involving situations 
presumed to demand the trait in question. When satisfactory criteria are 
available they should be included along with the other measures. 

2. Computation of coefficients of correlation between each of the measures 
and every other. The reliabilities of individual measures should be taken 
into account by correcting for attenuation, if possible. 

3. Setting up a table of inter-correlations of the measures included and 
proceeding with the factor analysis according to an approved method. 

4. Determination of the presence of factors, if any exist, and identification 
of that in question through reference to the loadings of the criterion, or 
criteria. 

5. Determination of the relative validities of measures by noting the 


extent to which they are contributed to by the factor identified through the 
criterion. 


BIBLIOGRAPHY 


1. Ryans, D. G.: “An Experimental Attempt to Analyze Persistent Behavior. 
I. Measuring Traits Presumed to Involve ‘Persistence.’” J. Gen. Psychol., 
1938, Vol. xrx, pp. 333-353. 

2. Ryans, D. G.: “‘An Experimental Attempt to Analyze Persistent Behavior. 
II. A Persistence Test.” J. Gen. Psychol., 1938, Vol. xrx, pp. 355-371. 

3. Thurstone, L. L.: A Simplified Multiple Factor Method. Chicago: University 
of Chicago Press, 1933. 


4. Thurstone, L. L.: ‘Current Misuse of the Factorial Methods.” Psychometrika, 
1937, Vol. 11, pp. 73-76. 














BOOK REVIEW 


Harry N. Rivuin. Educating for Adjustment: The Classroom Applica 
tions of Mental Hygiene. New York: D. Appleton-Century Co., 
1936, pp. xiv + 419. 


Rivlin’s book could be one of the most significant educational 
books of the decade. That it probably will not achieve recognition 
as such will be due entirely to the elaborate machinery of teacher- 
training. The teacher-training institutions and state liscensing 
boards have succeeded in building up elaborate (if sometimes over- 
lapping) requirements which, to the astonishment of the lay observer, 
succeed in almost totally neglecting any consideration of the child. 
Technical courses in the minutiae of methods, organization, administra- 
tion, etc. leave little time for study of the child as a child, in contrast 
to the child as that-to-which-the-teacher-teaches. The result is that 
the teacher once in service meets problems with which his courses in 
education have not prepared him to deal. It is just this lack in the 
teacher’s training that Educating for Adjustment could very successfully 
remove. 

In apparent recognition of the usual teacher’s inadequate under- 
standing of the psychology of children and their problems, Rivlin 
devotes his first seven chapters to a very admirable presentation of 
the pertinent psychology of child adjustment. The next eight 
chapters describe in a practical way how the classroom teacher can 
use such psychology in understanding and dealing with his children. 
The last chapter is devoted to the ever important problem of the 
teacher’s own personal adjustment. 

As a textbook for the student, or for the teacher in service, this 
book is excellent. The writing is lucid. The selection of materials 
and their organization evidence an understanding of the teacher’s 
problems. Exclusive emphasis upon one psychological point-of-view 
—the all too frequent weakness of most books on mental hygiene—is 
entirely absent here. The three major schools of psychoanalysis, 
behaviorism and Gestalt psychology are all simply explained. In the 
chapters. particularly concerned with practical classroom problems 
the author is careful to avoid the extreme behavior which should be 
dealt with by the psychologist or psychiatrist. He thinks in the 
teacher’s terms, and very specifically suggests to the teacher methods 
of maintaining a mental hygiene attitude. The case problems 
presented with each chapter are taken directly from the school room— 
every teacher should find study of these cases stimulating and helpful. 

C. M. LovutTIT. 
Indiana University. 
320 














=~ —=_ ff 5S SS ~. tt i 2 ee ff © » /t ae > = J @®& MRDMRnRRNnN &eNni# ~»-~QnododnNnmmmnii @ —s 


